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NETWORK INTERFACE DEVICE WITH 10 GB/S FULL-DUPLEX TRANSFER RATE 
Introduction 

[0001} The following specification describes a TCP/IP offload network interface device called Sahara, which is capable of 
full duplex data transfer rates of at least ten-gigablts/second. This Introduction highlights a few of the features of Sahara, 
which are more fully described throughout the remainder of the document. [0002] As shown in the upper-ieft-corner of FIG. 
2, data from a network can be received by a ten-gigabit TCP/IP offload network Interface device (TNIC) called Sahara via 
a Ten-gigabit Attachment Unit Interface (XAUl). Alternatively, a XFI (10-Gbit small fomn factor electrical interface) optical 
transceiver interface or XGMI1 10 Gigabit Media Independent Interface, for example, can be employed. 

[0003] In the particular lOG/s physical layer embodiment of XAUl, data is striped over 4 channels, encoded with an 
embedded clock signal then sent in a serial fashion over differential signals. Although a lOGb/S data rate is targeted, 
higher and lower data rates are possible. 

[0004] In this embodiment, the data is received from XAUl by Receive XGMII Extender Sublayer (RcvXgx) hardware, 
aligned, decoded, re-assembled and then presented to the Receive media access control (MAC) hardware (RcvMac). In 
this embodiment, the Receive MAC (RcvMac) Is separated from the Transmit MAC (XmtMac), although in other 
embodiments the Receive and Transmit MACs may be combined. [00051 The Receive MAC (RcvMac) performs known 
; MAC layer functions on the data it has received, such as MAC address filtering and checking the format of the data, and 
' stores the appropriate data and status in a Receive MAC Queue (RcvMacQ). The Receive MAC Queue (RcvMacQ) is a 
buffer that is located in the received data path between the Receive MAC (RcvMac) and the Receive Sequencer (RSq). 
[0006] The Receive Sequencer (RSq) includes a Parser (Prs) and a Socket Detector (Det). The Parser reads the header 
information of each packet stored in the Receive MAC Queue (RcvMacQ). A FIFO stores IP addresses and TCP ports of 
the packet, which may be called a socket, as assembled by the parser. The Socket Detector (Det) uses the IP addresses 
and TCP ports, stored in the FIFO, to determine whether that packet con-esponds to a TCP Control Block (TCB) that Is 
being maintained by Sahara. The Socket Detector compares the packet socket infonmation fi^m the FIFO against TCB 
socket information stored in the Socket Descriptor Ram (SktDscRam) to determine TCB association of the packet. The 
Socket Detector (Det) may utilize a hash bucket similar to that described in U.S. Published Patent Application No. 
200501 82841 , entitled "Generating a hash for a TCP/IP offload device," to detect the packet's TCB association. Compared 
to prior art TNlCs, that used a processor to determine that a packet corresponds to a TCB. this hardware Socket Detector 
(Det) frees the chip's processor for other tasks and increases the speed with which packet-TCB association can be 
determined. [0007J The Receive Sequencer's (RSq) Socket Detector (Det) creates a Receive Event Descriptor for the 
received packet and stores the Receive Event Descriptor in a Receive Event Queue implemented in the Dma Director 
(Dmd) block. The Receive Event Descriptor comprises a TCB identifier (TCBID) that identifies the TCB to which the packet 
corresponds, and a Receive Buffer ID that identifies where, in Dram, the packet is 

stored The Receive Event Descriptor also contains information derived by the Receive Sequencer (RSq), such as the 
Event Code (EvtCd), Dma Code (DmaCd) and Socket Receive Indicator (SkRc v) The Receive Event Queue (RcvEvtQ) is 
implemented by a Dma Director (Dmd) that manages a variety of queues, and the Dma Director (Dmd) notifies the 
Processor (CPU) of the entry of the Receive Event Descriptor In the Receive Event Queue (RcvEvtQ) 

[0008] Once the CPU has accessed the Receive Event Descriptor stored In the Receive Event Queue (RcvEvtQ), the CPU 
can check to see whether the TCB denoted by that descriptor is cached in Global Ram (GRm) or needs to be retrieved 
from outside the chip, such as off-chip memory or host memory. The CPU also schedules a DMA to brrng the header from 
the packet located in Dram into Global Ram (GRm), which in this embodiment Is dual port SRAM. The CPU then accesses 
the IP and TCP headers to process the frame and perfomi state processing that updates the corresponding TCB The CPU 
contains specialized instructions and registers designed to facilitate access and processing of the headers in the header 
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buffers by the CPU. For example, the CPU automatically computes the address for the header buffer and adds to it the 
value from an index register to access header fields within the header buffer 

[0009] Queues are implemented in a Queue RAM (QRm) and managed jointly by the CPU and the DMA Director (Dmd). 
DMA events, whether instituted by the CPU or the host, are maintained in Queue RAM (Qrm) based queues. 

1001 0] The CPU is pipelined in this embodiment, with 8 CPUs sharing hardware and each of those CPUs occupying a 
different pipeline phase at a given time. The 8 CPUs also share 32 CPU Contexts. The CPU is augmented by a Plurality of 
functional units, including Event Manager (EMg), Slow Bus Interface (Slw), Debugger (Dbg), Writable Control Store (WCS) 
1 Math Co-Processor (MCp), Lock Manager (LMg), TCB Manager (TMg) and Register File (RFl). The Event Manager 
(EMg) processes external events, such as DMA Completion Event (RspEvt), Interrupt Request Event (IntEvt), Receive 
Queue Event (RcvEvt) and others The Slow Bus Interface (Slw) provides a means to access non-cntical status and 
configuration registers. The Writable Control Store (WCS) includes microcode that may be rewntten. The Math Co- 
Processor (MCp) divides and multiplies, which may be used for example for TCP congestion control. 

[001 1] The Lock Manager (LMg) grants locks to the various CPUs, and maintains an ordered queue which stores lock 
requests allowing allocation of locks as they become available. Each of the locks is defined, in hardware or firmware, to 
lock access of a specific function. For example, the Math Co-Processor (MCp) may require several cycles to complete an 
operation, during which time other CPUs are locked out from using the Math Co-Processor (MCp) Maintaining locks which 
are dedicated to single functions allows better performance as opposed to a general lock which serves multiple functions. 

[0012] The Event Manager (EMg) provides, to the CPU, a vector for event service, significantly reducing idle loop 
instmction count and service latency as opposed to single event polling performed by microcode in previous designs. That 
is the Event Manager (EMg) monitors events, prioritizes the events and presents, to the cpu, a vector which is unique to 
the event type The CPU uses the vector to branch to an event service routine which is dedicated to servicing the unique 
event type Although the Event Manager (EMg) is configured in hardware, some flexibility Is built in to enable or disable 
some of the events of the Event Manager (EMg) 

Examples of events that the Event Manager (EMg) checks for include: a system request has occurred over an I/O bus 
such as PCI; a DMA channel has changed state; a network interface has changed state; a process has requested status 
be sent to the system; and a transmitter or receiver has stored statistics. 

[0013] As a further example, one embodiment provides a DMA event queue for each of 32 CPU contexts, and an idle bit 
for each CPU context indicating whether that context is idle. For the situation in which the idle bit for a context is set and 
the DMA event queue for that context has an event (the queue is not empty), the Event Manager (EMg) recognizes that 
the event needs to be serviced, and provides a vector for that service. Should the idle bit for that context not be set, 
instead of the Event Manager (EMg) initiating the event service, flmiware that is running that context can poll the queue 
and sePi^ice the event. 

[00141 The Event Manager (EMg) also serves CPU contexts to available CPUs, which in one embodiment can be 
implemented in a manner similar to the Free Buffer Server (FBS) that is described below. A CPU Context is an abstract 
which represents a group of resources available to the CPUs only when operating within the context. Specifically, a 
context specifies a specific set of resources comprising CPU registers, a CPU stack, DMA descnptor buffers, a DMA event 
queue and a TCB lock request. When a CPU is finished with a context, it writes to a register, the CPU Context ID, which 
sets a flip-flop indicating that the context is free. Contexts may be busy, asleep, idle (available) or disabled. 

[0015] The TCB Manager (TMg) provides hardware that manages TCB accesses by the plurality of CPUs and CPU 
Contexts The TCB Manager (TMg) facilitates TCB locking and TCB caching. In one embodiment, 8 CPUs with 32 CPU 
Contexts can together be processing 4096 TCBs, with the TCB Manager (TMg) coordinating TCB access. The TCB 
Manager (TMg) manages the TCB cache, grants locks to processor contexts to work on a particular TCB, and maintains 
order for lock requests by processor contexts to work on a TCB that is locked. 

[0016] The order that is maintained for lock requests can be affected by the priority of the request, so that high priority 
requests are serviced before earlier received requests of low priority. This is a special feature built into the TCB Manager 
(TMg) to service receive events, which are high priority events. For example, two frames corresponding to a TCB can be 
received from a network. While the TCB is locked by the first processor context that is processing the first receive packet, 
a second processor context may request a lock for the same TCB in order to process a transmit command. A third 
processor context may then request a lock for the same TCB in order to process the second receive frame. The third lock 
request is a high priority request and will be given a place in the TCB lock request chain which will cause it to be granted 
prior to the second, low priority, lock request The lock requests for the TCB are chained, and when the first CPU context, 
holding the initial lock gets to a place where it is convenient to release the lock of the TCB. it can query the TCB Manager 
(TMg) whether there are any high priority lock requests pending. The TCB Manager (TMg) then can release the lock and 
grant a new lock to the GPU context that is waiting to process the second receive frame [001 7J Sequence Sen/ers issue 
sequential numbers to CPUs during read operations. Used as a tag to maintain the order of receive frames. Also used to 
provide a value to insert into the IP header Identification field of transmit frames. 

(0018) Composite Registers are virtual registers comprising a concatenation of values read from or to be written to 
multiple single content registers. When reading a Composite Register, short fields read from multiple single content 
registers are aligned and merged to form a 32 bit value which can be used to quickly issue DMA and TCB Manager (TMg) 
commands When writing to Composite Registers, individual single content registers are loaded with short fields which are 
aligned after being extracted from the 32-bit ALU output. This provides a fast method to process Receive Events and DMA 
Events The single content registers can also be read and written directly without use of the Composite Register. 
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[0019] A Transmit Sequencer (XSq) shown in the upper right portion of Figure 2 includes a Formatter (Fmf) and a 
Dispatcher (Dsp). The Transmit Sequencer (XSq) is independent of the Receive Sequencer (RSq) in this embodiment, 
and both can transfer data simultaneously at greater than 1 0GB/s. In some previous embodiments a device CPU running 
microcode would modify a prototype header in a local copy of a TCB that would then be sent by DMA to a DRAM buffer 
where it would be combined with data from a host for a transmit packet. A transmit sequencer could then pass the data 
and appended header to a MAC sequencer, which would add appropriate information and transmit the packet via a 
physical layer interface. 

i [0020] In a current embodiment, the CPU can initiate DMA of an unmodified prototype header from a host memory 
' resident TCB to a transmit buffer and initiate DMA of transmit data from host memory to the same transmit buffer, vvniie 
1 the DMAs are taking place, the CPU can write a transmit command, comprising a command code and header modification 
I data to a proxy buffer. When the DMAs have completed the CPU can add DMA accumulated checksum to the proxy 

• buffer then initiate DMA of the proxy buffer contents (transmit command) to the Transmit Command Queue (XmtCmdQ). 
The Transmit Sequencer (XSq) Dispatcher (Dsp) removes the transmit command from the Transmit Command Queue 
(XmtCmdQ) and presents It to the Dram Controller (DrmCtI) which copies the header modification portion to the XmtDmaQ 
then copies header and data from the transmit buffer to the XmtDmaQ. The Transmit Sequencer (XSq) Formatter (Fmt) 
removes header modification data, transmit header and transmit data from the Transmit DMA Queue (XmtDmaQ) merges 
the header modification data with the transmit header then forwards the modified transmit header to the Transmit Mac 

! Queue (XmtMacQ) followed by transmit data. Transmit header and transmit data are read from the Transmit Mac Queue 

• (XmtMacQ) by a Transmit MAC (XmtMac) for sending on XAUI. 

[00211 For the situation in which device memory is sufficient to store all the TCBs handled by the device, e g. 4096 TCBs 
in one embodiment, as opposed to only those TCBs that are currently cached. In one embodiment instead of a queue of 
descriptors for free buffers that are available, a Free Buffer Server (FBS) is utilized that Informs the CPU of buffers that are 
available. The Free Buffer Server (FBS) maintains a set of flip-flops that are each associated with a buffer address, with 

• each flip-flop indicating whether its corresponding buffer is available to store data. The Free Buffer Server (FBS) can 
provide to the CPU the buffer address for any buffer whose flip-flop is set. The list of buffers that may be available for 

I storing data can be divided into groups, with each of the groups having a flip-flop indicating whether any buffers are ^ 
' available in that group. The CPU can simply write a buffer number to the Free Buffer Server (FBS) to free a buffer, which 
sets a bit for that buffer and also sets a bit in the group flip-flop for that buffer. To find a free buffer, the Free Buffer Server 
(FBS) looks first to the group bits, and finding one that is set then proceeds to check the bits within that group, flipping 

the bit when a buffer is used and flipping the group bit when all the buffers in that group have been used The Free Buffer 
Server (FBS) may provide one or more available free buffer addresses to the CPU in advance of the CPU s need for a free 
buffer or may provide free buffers in response to CPU requests. 

[00221 Such a Free Buffer Server (FBS) can have N levels, with N=l for the case in which the buffer flip-flops are not 
grouped For example, 2 MB of buffer space may be divided into buffers having a minimum size that can store a packet, 
e q 1 5 KB yielding about 1 ,333 buffers. In this example, the buffer identifications may be divided into 32 groups each 
having 32 buffers, with a flip-flop corresponding to each buffer ID and to each group. In another example, 4096 buffers can 
be tracked using 3 levels with 8 flips-flops each. Although the examples given are in a networking environment, such a 
free-buffer server may have applications in other areas and is not limited to networking. 

100231 The host interface in this embodiment is an eight channel implementation of PciExpress (PciE) which provides 
16Gb of send and 16Gb of receive bandwidth Similar in functional concept to previous Alacritech TNICs, Sahara differs 
; substantially In it's architectural implementation. The receive and transmit data paths have been separated to facili ate 
1 greater perfonnance The receive path Includes a new socket detection function mentioned above, and the transmit path 
I adds a formatter function, both serving to significantly reduce firmware instruction count. Queue access is now 
. accomplished in a single atomic cycle unless the queue- Indirect feature is utilized As mentioned above TCB managment 
function has been added which Integrates the cam, chaining and TCB Lock functions as well as Cache Buffer allocation. A 
I new event manager fundlon reduces idle-loop Instruction count to just a few instructions New statistics registers, 
automatically accumulate receive and transmit vectors. The receive parsing function includes multicast filtenng and, for 
support of receive-side-scaling, a toeplltz hash generator The Director provides compact requests for initiating TCB, SGL 
and header DMAs. A new CPU Increases the number of pipeline stages to eight, resulting in single instruction ram 
accesses while improving operating frequency Adding even more to perfonnance are the following enhancements of the 
CPU. 

Q 32-bit literal instruction field 
Q 16-bit literal with 16-bitjump address. 
Q Dedicated ram-address literal field. 
; Q Independent src/dst operands 

Q Composite registers E g {3'bO, 5'bCpCxld, 5'bO, 7'bCxCBId, 12'bCxTcld} 
Q Per-CPU file address registers 

- CPU-mapped file operands 

Q Per-CPU Context ID registers. 

- Context-mapped file operands 
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- Context-mapped ram operands 

- Par-context pc stacks 

- Per-context file address registers. 

- Per-context ram address registers. 

- Per-context TCB ID registers. 

- Per-context Cache Buffer ID registers. 

- CchBuf-mapped ram operands. 

- Per-context Header Buffer ID registers. 

- Per-context Header Buffer index registers 

- HdrBuf-mapped ram operands. 
Per-CPU queue ID register 

- Queue-mapped file operands. 
^ - Queue-direct file operands. 

Parity has been implemented for ail internal rams to ensure data integrity. This has become important as silicon 
geometries decrease and alpha particle induced erors increase. 

Sahara employs several, industry standard, interfaces for connection to network, host and memory. Following is a list of 

interface/transceiver standards employed: 
I Spec Xcvrs Pins Attacnmerit Description 
I XAUl 8-CML XGbe Phy 1 0Gb Attachment Unit Interface. 
I MGTIO -LVTTL Phy Management I/O. 
' PCI-E ??-LVDS Host Pel Express. 
' RLDI^ ??-HSTL RLDRAM Reduced Latency DRAM. 
^ SPI -LVTTL FLASH MEM Serial Peripheral Interface. 

I Sahara is implemented using flip-chip technology which provides a few important benefits. This techno^^^^^ 

strategic placement of I/O cells across the chip, ensuring that the die area is not pad-limited. The greater freedom of I/O 

: cell and ram cell placement also reduces connecting wire length thereby improving operating frequency. 

External devices are employed to form a complete TNIC solution. FIG. 1 show« some external devices that can be 
employed. In one embodiment the following memory sizes can be implemented. 

Dcmm - 2 X 0M X 36 - Receive RLDRAM (64MB total) . Drmm - 2 x 4M x 36 - Transmit RLDRAM (32MB total) , Fish - 1 x 
IM X 1 - Spi Memory (Flash or EEProm) . RBuf - Registered double data rate buffers. Xpak - Xpak fiberoptic transceiver 

module. 

FUNCTIONAL DESCRIPTION 

A functional block diagram of Sahara Is shown in Figure 2. Only data paths are illustrated, ^""^tions have been defined to 
allow asynchronous communication with other functions. This results in smaller clock don^^^f^/Sf^Sfs better 
boundarL are shown with dashed lines) which minimize clock tree leaves and g^^g'-^Ph'^^l araa^^^^^^^ 
skew margins, higher operating frequency and reduced power consumption Also, independent ^ock tre^ vwH aHow 
selection of optimal operating frequencies for each domain and v^ill also facilrtate '""P^°^^":^"^ "^^^"^^^^^^ 
management states. Wires, which span functional clocks are no longer synchronous, again resulting in improved operating 
frequencies. 

Sahara comprises the following functional blocks and storage elements: 

♦ XgxRcvDes - XGXS Deserializer. 

♦ XgxXmtSer - XGXS Serializer. 

♦ XgeRcvMac - XGbe Receive Mac. 
. ♦ XgeXmtMac - XGbe Transmit Mac. 

♦ RSq - Receive Sequencer. 
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♦ XSq - Transmit Sequencer 

♦ DrmCtI - Dram Control . 

♦ QMg - Queue Manager . 

♦ CPU - Central Processing Unit 

♦ Dmd - DMA Director . 

♦ BIA - Host Bus Interface Adaptor 

♦ PciRcvPhy - Pci Express Receive Ptiy. 

♦ PciXmtPhy - Pci Express Transmit Phy. 

♦ PCIeCore - Pci Express C 

♦ MgtctI - Phy Managment i/O Control . 

♦ SpiCtI - Spi Memory Co itrol. 

♦ GlobalRam - 4 X 8K X 36 - Global Ram (GlbRam/GRm) . 
QueMgrP.am - 2 X 8K X 36 - Queue Manager Ram (QRm) . 
. ParityRam - 1 X 16K X 16 - Dram Parity Ram (PRm) . 
CpuRFIRam _ 2 X 2K X 36 - CPU Register File Ram (RFI) . 

. CpuWCSRam - 1 X 8K X 108 - CPU Wnteable Control Store (WC5) . 
SktDscRam - 1 X 2K X 288 - RSq Socl<et Descriptor Ram. 
RcvMacQue - 1 X 64 X 36 - Receive Mac Data Queue Ram. 

♦ XmcMacQue - 1 X 64 X 36 - Transmit Mac Data Queue Ram. 

. XmtVecQue - 1 X 64 X 36 - Transmit Mac Vector (Stats) Queue Ram. 

♦ XmtCmdQHi - 1 X 128 X 145 - Transmit Command Q - higti priority. 
» XmtCmdQLo - 1 X 128 X 145 - Transmit Command Q - low priority. 

♦ RcvDmaQue — 1 X 128 X 145 - Parse Sequencer Dma Fifo Ram 
« XmtDmaQue - 1 X 128 X 145 - Fomiat Sequencer Dma Fifo Ram. 

» D2gDTTiaQue - 1 X 128 X 145 - Dram to Global Ram Dma Fifo Ram. 

♦ D2hDmaQue - 1 X 128 X 145 - Dram to Host Dma Fifo Ram. 
G2dDmaQue - 1 X 128 X 145 - Global Ram to Dram Dma Fifo Ram. 

♦ G2hDmaQue - 1 X 128 X 145 - Global Ram to Host Dma Fifo Ram. 
H2dDmaQue - 1 X 128 X 145 - Host to Dram Dma Fifo Ram. 
H2gDniaQue - 1 X 128 X 145 - Host to Global Ram Dma Fifo Ram. 

♦ PciHdrRam - 1 X 68 X 1 09 - PCI Header Ram. 
PclRtyRam - 1 X 256 X 69 - PCI Retry Ram. 

« PciDatRam - 1 X 182 X 72 - PCI Data Ram 
Functional Synopsis 



In short Sahara performs all the functions of a traditional NIC as well as performing offload of 

The CPU manages all functions except for host access of flash memory, phy management registers and pci configuration 
registers 

Frames which do not Include IP datagrams are processed as would occur with a non-offload NIC. Receive frames are 
Sd based on £ addre^ and errors, then transferred to preallocated receive buffers within host memory Outbound 
frames are retrieved from host memory, then transmitted. 



http://ww.wipo.int/pctdb/eii/fetch.jsp?SEARCH_IA=US2007010665&D^ 8/10/2010 



(WO/2007/130476) NETWORK INTERFACE DEVICE WITH 10 GB/S FULL-DUPLE... Page 



Frames which include IP datagrams but do not include TCP segments are trasmitted without any protocol offload bu 
mcSrames are pars and checked for protocol errors. Receive frames without datagram errors are passed to the 
hosTand error f'arlfes are dumped Checksum accumulation is also supported for Ip datagram frames conta.ning UDP 
segments. 

Frames which include Tcp segments are parsed and checked for errors. Hardware checking is then performed for 

summary Tcp/lp frames which pass the ownership test are processed by the finite state mac^iine F^ which is 
implemented by the TNIC CPU. Tcp/lp frames for non-owned sockets are supported with checksum 
accumulation/insertion 

The following Is a description of the steps which occur while processing a receive frame 
Receive Mac 

1 ) Store incoming frame In RcvMacQue. 
j 2) Perfomi link level parsing while receiving/storing incoming frame 
I 3) Save receive statusA^ector infonnation as a mac trailer in RcvMacQue. 

Receive Sequencer 

1 ) Obtain Rbfid from RcvBufQ (Specifies receive buffer location in RcvDnn). 
i 2) Retneve frame data from RcvMacQue and perform link layer parsing 
I 3) Filter frame reception based on link address 

4) Dump if filtered packet else save frame data in receive buffer. 

5) Parse mac trailer. 

6) Save a parse header at the start of the receive buffer. 
, 7) Update RcvStatsR. 

8) Select a socket descriptor group using the socket hash. 

9) Compare socket descriptors within group against parsed socket ID to ^r'Tf^'j; J 

Tcbid and extract DmaCd from SktDsc else set both to zero. I ) ) Store entry on the RSqEvtQ {RSqEvt, DmaCd, SkRcv, 

Tcbid, Rbfld} . 
CPU 

1 ) Pop event descriptor from RSqEvtQ. 

2) Jump, if marker event, to marker service routine. 

3) Jump, if raw receive, to raw service routine. 

4) Use TcbMgr to request lock of TCB. 

5) Continue if TCB grant, else Jsx to idle loop. 

6) Jump, if ITcbRcvBsy, to 12. 
T) Put Rbfld on to TCB receive queue. 

8) Use TcbMgr to release TCB and get next owner. 

9) Release current context. 

10) Jump, if owner not valid, to idle. 
1 1 ) Switch to next owner and Rtx. 

1 2) Schedule TCB DMA if needed. 

13) Schedule header DMA. 

14) Magic stuff. 

The following Is a description of the steps which occur while processing a transmit frame. 
CPU 
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1 ) Use TcbMgr to request lock of TCB. 

2) If not TCB grant, Jsx, to idle loop. 

3) Magic stuff here. 

4) Schedule H2dDma. 

5) Pop Proxy Buffer Address (PxyAd) off of Proxy Buffer Queue (PxyBufQ). 

6) Partially assemble formatter command variables in PxyBuf. 

7) If not H2dDmaDn, Jsx to idle loop. 

8) Check H2dDma ending status. 

9) Finish assembling fbmriatter command variables (Chksum+) in PxyBuf. 

10) Write Proxy Command {PxySz,Queld,PxyAd} to Proxy Dispatch Queue (PxyCmdQ). 
1 1 ) Magic stuff here. 

Proxy Agent 

1 ) Pop PxyCmd off bf PxyCmdQ. 

2) Retrieve transmit descriptor from specified PxyBuf. 

3) Push transmit descriptor on to specified transmit queue. 

4) Push PxyAd on to PxyBufQ. 
Transmit Sequencer 

1 ) Pop transmit descriptor off of transmit queue. 

2) Copy protoheader to XmtDmaQue. 

3) Modify protoheader while copying to XmtMacQue. 

4) Release protoheader to XmtMac (increment XmtFmtSeq). 

5) Copy data from transmit buffer to XmtDmaQue. 

6) Copy data from transmit buffer to XmtMacQue. 

7) Write EOP and DMA status to XmtMacQue. 

8) Push XmtBuf on to XmtBufQ to release transmit buffer. 
Transmit Mac 

1 ) Wait for transmit packet ready (XmtFmtSeq > XmtMacSeq). 

2) Pop data off of XmtMacQue and send until EOP/Status encountered 

3) If no DMA error, send good crc else send bad crc to void frame 

4) Increment XmtMacSeq 

5) Load transmit status into XMacVecR and flip XmtVecRdy 
Transmit Sequencer 

1 ) If XmtVecRdy != XmtVecSvc, read XMacVecR and update XSNMPRgs 

2) Flip XmtVecSvc 

HOST MEMORY (HstMem) Data Structures 

Host memory provides storage for control data and packet payload Host memory data structures have been defined which 
facitate communication between Sahara and the host system Sahara .h-^^^V^^^ data 
address and size for access of these data structures resulting in a significant reduction of firmware overtiead These data 
stnjctures are defined below. 
TCP Control Block 
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TCBs comprise constants, cached-vaTTables and delegated-vairables which are stored in host memory based TCB 
Buffers (TcbBuf) that are fixed in size at 51 2B A diagram of TCB Buffer space is shown in FIG 3, and a TCB Buffer is 
shown in FIG 4 The TCB vanes in size based on the version of IP or host software but in any case may not exceed the 
512B limitation imposed by the size of the TcbBuf 

TCBs are copied as needed into GIbRam based TCB Cache Buffers (CchBuf) for direct access by the CPUs A special 
DMA operation is implemented which copies the TCB structure from TcbBuf to CchBuf using an address c^lcu^^^ 
the configuration constant, TCB Buffer Base Address (TcbBBs), and the TcbBuf size of 512B. The DIVIA size is detemiined 
by the configuration constant, H2gTcbSz 

Constants and cached-variables are read-only, but delegated variables may be modified by the CPUs while the TCB is 
cached All TCBs are eventually flushed from the cache at which time, if any delegated-vairable has been niodified, the 
changed variable must be copied back to the TcbBuf This is accomplished with a special DMA operatjon which copies to 
the TcbBuf from the CchBuf, ail delegated variables and incidental cached variables up to the next 32B boundary The , 
DMA operation copies an amount of data determined by the configuration constant G2hTcbSz This constant should be set 
to a multiple of 32B to preclude read-modify- wrrte operations by the host memory controller To this same end, delegated 
variables are located at the beginning of the TcbBuf to ensure that DMAs start at a 64-byte boundary. Refer to sections 
Global Ram, DMA Director and Slow Bus Controller for additional information. 

Prototype Headers 

Every connection has a Prototype Header (PHdr) which is not cached in GIbRam but is instead copied from host memory 
to DRAM transmit buffers as needed. Headers for all connections reside in individual, 1 KB Composite Buffers (CmpBuf, 
FIG. 6) which are located in contiguous physical memory of the host as shown in FIG 5. 

A composite buffer comprises two separate areas with the first 256-btye area reseived fo[ storage of the prototype header 
and the second 768-byte area reserved for storage of the TCB Receive Queue (TRQ). Although the PHdr size and TRQ 
size may vary, the CmpBuf size remains constant 

Special DMA operations have been defined, for copying prototype headers to transmit buffere. A host address is 
computed using the configuration constant - Composite Buffer Base Address (CmpBBs), with a fixed buffer size of 1 KB 
Another configuration constant, prototype-header transmit DMA-size (H2dHdrSz), indicates the size of the copy Refer to 
sections DMA Director and Slow Bus Controller for additional information 

TCB Receive Queue 

Every connection has a unique TCB Receive Queue (TRQ) in which to store infomiation about buffered receive packets or 
frames. The TRQ is allocated storage space in the TRQ reserved area of the composite buffere previous y defined. The 
TRQ size is programmable and can be up to 768-bytes deep allowing storage of up to 192 32-brt descnptore Th s is 
slightly more than needed to support a 256KB window size assuming 1448-byte payloads with the timestamp option 

enabled. 

When a TCB is ejected from or imported to a GIbRam TCB Cache Buffer (CchBuf), its corresponding receive queue may 
or may not contain entries. The receive queue can be substantially larger than the TCB and therefore contnbute greatly to 
latency. It is for this reason that the receive queue is copied only when it contains entries. It is expected that this DMA 
seldom occurs and therefore there is no special DMA support provided. 

Transmit Commands. 

Transmit Command Descriptors (XmtCmd. FIG. 7) are retrieved from host memory resident transmit a>mmand rings 
(XmtRng FIG. 8). Transmit Ring space is shown in Fig. 9. A XmtRng is implemented for each connection. The size is 
configurable up to a maximum of 256 entries. The descriptors indicate data transfers for offloaded connections and for raw 
packets. 

The command descriptor includes a Scatter-Gather List Read Pointer (SglPtr, Fig. xx-a) , a 4-byte reserved fjel^d, a 2-byte 
Flags field (Figs), a 2-byte List Length field (LCnt), a 12-byie memory descriptor and a 4-byte reserved field. The definition 
of the contents of Figs is beyond the scope of this document. The SglPtr is used to fetch page descriptors frorn a scatter- 
gather list and points to the second page descriptor of the list. MemDsc[0] is a copy of the first entry in the SQL and is 
placed here to reduce latency by consolidating what would othenwise be two DMAs. LCnt indicates the number of entnes 
in the SGL and includes MemDsc[0]. A value of zero indicates that no data is to be transferred. 

The host compiles the command descriptor or descriptors in the appropriate ring then notifies Sahara of the new 
command(s) by writing a value, indicating the number of new command descriptors, to the transmit tickle register of the 
targeted connection. Microcode adds this incrementel value to a Transmit Ring Count (XRngCnt) variable in the cached 
TCB. Microcode detemiines command descriptor readiness by testing XRngCnt and decrements it each time a command 
is fetched from the ring. 

Commands are fetched using an address computed with the Transmit Ring Pointer (XRngPtr), fetched from the cached 
TCB and the configuration constants Transmit Ring Base address (XRngBs) and Transmit Rings Size (XRngSz). XRngPtr 
is then incremented by the DMA Director. Refer to sections Gtobal Ram, DMA Director and Slow Bus Controller for 
additional information. 

Receive Commands. 



http://ww.wipo.int/pctdb/en/fetch.jsp?SEARCH_IA=US2007010665&DBSELECT=PCT... 8/10/2010 



(WO/2007/130476) NETWORK INTERFACE DEVICE WITH 10 GB/S FULL-DUPLE... Page 9 of 49 



Receive Command Descriptors (RcvCrnd. FIG. 10) are retreived from host memory resident Receive Command Rings 
(RcvRng FIG 1 1). Receive Ring space is shown in FIG. 12. A RcvRng is Implemented for each connection The size is 
configurable up to a maximum of 256 entries. The descriptors indicate data transfers for offloaded connections and the 
availability of buffer descriptor blocks for the receive buffer pool. 

The descriptors are basically Identical to those used for transmit except for the definition of the contents of the 2-byte 
Flags field (Figs). Connection 0 is a special case used to indicate that a block of buffer descriptors Is available for the 
general receive buffer pool. In this case SglPtr points to the first descriptor in the block of buffer descnptors. Each buffer 
descriptor contains a 64-bit physical address and a 64-bit virtual address. LCnt Indicates the number of descriptore m the 
list and must be the same for every list. Furthermore. LCnt must be a whole fraction of the size of the System Buffer 
Descriptor Queue (SbfDscQ) which resides In Global Ram. Use of other lengths will result In DMA fragmentation at the 
SbfDscQ memory boundaries. 

The host compiles the command descirptor or descriptors in the appropriate ring then notifies Sahara of the new 
command(s) by writing a value, indicating the number of new command descriptors, to the receive tickle register of the 
targeted connection. Microcode adds this incremental value to a Receive Ring Count (RRngCnt) vanable in the cached 
TCB. Microcode determines command readiness by testing RRngCnt and decrements it each time a command is fetched 
from the rrng. 

Commands are fetched using an address computed with the Receive Ring Pointer (RRngPtr), fetched from the cached 
TCB and the configuration constants Receive Ring Base address (RRngBs) and Receive Ring Size (RRngSz). RRngPtr 
is then incremented by the DMA Director. Refer to sections Global Ram, DMA Director and Slow Bus Controller for 
additional information 

Scatter-Gather Lists. 

A Page Descriptor is shown in FIG. 13, and a Satter-Gather List is shown in FIG. 14. Applications send and receive data 
through buffers which reside in virtual memory. This virtual memory comprises pages of segmented physical memory 
which can be defined by a group of Memory Descriptors (MemDsc, FIG. 13). This group is referred to as a Scatter-Gather 
List (SGL. FIG. 14) The SGL is passed to Sahara via a pointer (SglPtr) included in a transmit or receive descnptor. 

Memory descriptors In the host Include an 8-byte Physical Address (PhyAd, FIG. 13), 4-byte Memory Length (Len) and an 
8-byte reserved area which is not used by Sahara. Special DMA commands are implemented which use an SglPtr that is 
automatically fetched from a TCB cache buffer. Refer to section DMA Director for additional infonnation. 

System Buffer Descriptor Lists. 

A System Bufer Descriptor is shown in FIG. 15. Raw receive packets and slow path data are copied to system buffers 
which are taken from a general system receive buffer pool. These buffers are handed off to Sahara by compiling a list of 
System Buffer Descriptors (SbfDsc, Fig. xx) and then passing a pointer through the receive ring of connection 0. Sahara 
keeps a Receive Ring Pointer (RRngPtr) and Receive Ring Count (RRngCnt) for the receive rings which allows fetching a 
buffer descriptor block pointer and subsequently the block of descriptors. 

The buffer descriptor comprises a Physical Address (PhyAd) and Virtual Address (VirAd) for a 2KB buffer. The physical 
address is used to write data to the host memory and the virtual address is passed back to the host to be used to access 
the data. 

Microcode schedules, as needed, a DMA of a SbfDsc list into the SbfDsc list staging area of the GIbRam. Microcode then 
removes individual descriptors from the list and places them onto context specific buffer descriptor queues until all queues 
are full. This method of serving descriptors reduces CTrtica! receive microcode overiiead since the cntical path code does 
not need to lock a global queue and copy a descriptor to a private area. 

NIC Event Queues. 

Event notification is sent to the host by writing NIC Event Descriptors (NEvtDsc, FIG. 16) to the NIC Event Queues 
(NicEvtQ, FIG. 17). Eight NicEvtQs (FIG. 18) are Implemented to allow distribution of events among multiple host CPUs. 

The NEvtDsc Is fixed at a size of 32 bytes which includes eight bytes of data, a two byte TCB Identifier (Tcbid) , a two byte 
Event Code (EvtCd) and a four byte Event Status (EvtSta). EvtSta is positioned at the end of the structure to be wntten 
last because It functions as an event valid indication for the host. The definitions of the various field contents are beyond 

the scope of this document. 

Configuration constants are used to define the queues. These are NIC Event Queue Size (NEQSz) and NIC Event Queue 
Base Address (NEQBs) which are defined in section Slow Bus Controller. The CPU includes a 

pair of sequence registers, NIC Event Queue Write Sequence (NEQWrtSq) and NIC Event Queue Release Sequence 
(NEQRIsSq) for each NicEvtQ These also function as read and wirle pointers Sahara increments NEQWrtSq for each 
write to the event queue The host sends a release count of 32 to Sahara each time 32 queue entries have been vacated 
Sahara adds this value to NEQRIsSq to keep track of empty queue locations Additional information can be found in 

sections CPU Operands and DMA Director 

GLOBAL RAM (GIbRam/GRm) 
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GIbRam a 128KB dual port static ram, provides worl<ing memory for the CPU. The CPU has exclusive access to a single 
port ensuring zero wait access The second port is used exclusively during DMA operations for the movement of data, 
commands and status. GIbRam may be wrrtten with data units as little as a byte and as large as 8 bytes All data is 
protected with byte parity ensuirng detection of all single bit errors 

Multiple data structures have been pre-defined, allowing stmcture specific DMA operations to be irriplemented Also, the 
predefined structures allow the CPU to automatically compile GIbRam addresses using contents of both conflguration 
registers and dynamic registers The resulting effect is reduced CPU overhead. The following list shows the structures and 
the memory used by them Any additional structures may reduce the quantity or size of the TCB cache buffers 

HdrBufs 8KB = 128B/Hbf 2Bufs/ctx * 32Ctxs - Header Bu f fers . 

DmaDscs 4KB = 16B/Dbf 8 Dbfs /cex *,32 ~txs - Dma Descriptor Buffers . 

SbfDscs 4KB = 16B/Sbf 8 Dbfs/Ctx * 32Ctxs - Dma Descriptor Buffers . 

PxyBufs 2KB = 32B/Pbf 64 Pbf s - Proxy Buf fers 

TcbBMap 51 2B = lbn"CB 4 KTcbs/Map / 8b/B - TCB Bit Map. 

CchBufs 1 09KB = 1 KB/Cbf 1 09Cbfs - TCB Cache Buf fers 

128KB 

Header Buffers 

FIG 19 shows Header Buffer Space. Receive packet processing uses the DMA of headers from the DRAM receive buffers 
(RcvBuf/Rbf) to GIbRam to which the CPUs have immediate access An area of GIbRam has been partitioned in to buffers 
HdrBuf/Hbf, Fig xx) for the purpose of holding these headers Each CPU context is assigned two of these buffers and 
each CPU context has a Header Buffer ID (Hbfid) register that indicates which buffer is active While one header is being 
processed another header can be pre- fetched thereby reducing latency when processing sequential frames. 

Configuration constants define the buffers They are Header Buffer Base Address (HdrBBs) and Header Buffer Size 
(HdrBSz) The maximum buffer size allowed is 256B 

Special CPU operands have been provided which automatically compile addresses for the header buffer area Refer to 
section CPU Operand for additional information. 

A special DMA is implemented which allows efficient initiation of a copy from DRAM to HdrBuf. Refer to section DMA 
Director for additional information. 

TCB Valid Bit Map 

A bit-map (FIG 20) is Implemented In GibRam wherein each bit indicates that a TCB contains valid data. This area is pre- 
defined by configuration constant TCB Map Base Address (TMapBs) to allow hardware asslstence CPU operands have 
been defined which utilize the contents of the Tcbld registers to automatically compute a GIbRam address Refer to CPU 
Operands and Slow Bus Controller for additional infomiation 

Proxy Buffers 

Transmit packet processing uses assembly of transmit descrrptors which are deposited into transmit command queues Up 
to 32-bytes (S-entires) can be wrrtten to the transmit queue while maintaining exclusive access In 

order to avoid spin-lock during queue access, a proxy DMA has been provided which copies contents of proxy buffers 
from GIbRam to the transmit command queues. Sixtyfbur proxy buffers of 32-bytes each are defined by microcode and 
identified by their starting address Refer to sections DMA Director and Transmit Operation for additional infonnation 

System Buffer Descriptor Stage 

Raw frames and slow path packets are delivered to the system stack via System Buffers (SysBuf/Sbf). These buffers are 
defined by System Buffer Descriptors (SbfDsc, See prior section System Buffer Descriptor Lists) comprising an 8-byte 
physical address and an 8-byte virtual address. The system assembles 128 SbfDscs into a 2KB list then deposits a pointer 
to this list on to RcvRng 0 The system then notifies microcode by a writing to Sahara's tickle register. Microcode copi^ the 
lists as needed Into a staging area of GIbRam from which individual descriptors will be distributed to each CPU contexts 
system buffer descTrptor queue. This stage is 2KB to accommodate a single list 

DMA Descriptor Buffers 

FIG 21 shows DMA Descirptor Buffer Space. The DMA Director accepts DMA commands which utilize a 16- byte 
descrrptor (DmaDsc) compiled into a buffer (DmaBuf/Dbf) in GIbRam There are 8 descrrptor buffers available to each 
CPU context for a total of 256 buffers. Each of the 8 buffers corresponds to a DMA context such that a concaetenation of 
CPU Context and DMA Context {DmaCx.CpuCx} selects a unique DmaBuf. 

CPU operands have been defined which allow Indirect addressing of the buffers. See section CPU Operands for more 
information. Configuration constant - DMA Descrrptor Buffer Base Address (DmaBBs) defines the starting address in 
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GIbRam The DMA Director uses the CpuCx and DmaCx provided via the Channel Command Queues (CCQ) to retrieve a 

descriptor when required. 

Event mode DMAs also access DmaBufs but do so for a different purpose Event descirptors are written to the host 
memory resident NIC Event Queues They are fetched from the DmaBuf by the DIVIA Director, but are passed on as data 
instead of being used as extended command descirptors. Event mode utilizes two consecutive DmaBufs since event 
descriptors are 32-bytes long It is recommended that DmaCxs 6 and 7 be reserved exclusively for this purpose. 

TCB Cache Buffers 

A 12-bit identifier (Tcbid) allows up to 4095 connections to be actively supported by Sahara. Connection 0 is reserved for 
raw packet transmit and system buffer passing These connections are defined by a collection of variables and constants 
which are an-anged in a structure known as a TCP Control Block (TCB). The size of this structure and the number of 
connections supported preclude immediate access to all of them simultaneously by the CPU due to practical limitations on 
local memory capacity A TCB caching scheme provides a solution with reasonable tradeoffs between local memory size 
and the quantity of connections supported. FIG 22 shows Cache Buffer Space. 

Most of GIbRam is allocated to TCB Cache Buffers (CchBuf/CbO leaving primary storage of TCBs in inexpensive host 
DRAM In addition to storing the TCB structure, these CchBufe provide storage for the TCB Receive Queue (TRQ) and an 
optional Prototype Header (Phd) The Phd storage option is intended as a fallback in the event that problems are 
encountered with the transmit sequencer proxy method of header modification. FIG 23 shows a Prototype Header Buffer. 

CchBufs are represented by cache buffer identifiers (Cbfid). Each GpuCtx has a specialized register (CxCbfld) which is 
dedicated to containing the currently selected Cbfld. This value is utilized by the DMA Director, TCB Manager and by the 
CPU for special memory accesses. Cbfld represents a GIbRam resident buffer which has been defined by the 
configuration constants cache buffer base address (CchBBs) and cache buffer size (CchBSz) 

The CPU and the DMA Director access structures and vaTrables in the CchBufs using a combination of hard constants, 
configuration constants and vairable register contents TRQ access by the CPU is facilitated by the contents of the 
specialized Connection Control register - CxCCtI which holds the read and write sequences for the TRQ These are 
combined with TRQ Index (TRQIx), Cbfld and CchBSz and CchBBs to arrive at a GIbRam address from which to read or 
to which to write. The values in CxCCtI are initially loaded from the cached TCB's CPU Variables field (CpuVars) 
whenever a context first gains ownership of a connection The value in the CchBuf is updated immediately pnor to 
relinquishing conrol of the connection The constant TRQ Size (TRQSz) indicates when the values in CxCCtI should wrap 
around to zero FIG 24 shows a Delegated Variables Space. 

Four command sub-structures are implemented in the CchBuf. Two of these provide storage for receive commands — 
RCmdA and RCmdB and the remaining two provide storage for transmit commands - XCmdA and XCmdB. The 
commands are used in a ping-pong fashion, allowing the DMA to store the next command or the next SGL entry in one 
command area while the CPU Is actively using the other Having a fixed size of 32- bytes, the command areas are defined 
by the configuration constant - Command Index (Cmdix) 

The DMA Director includes a ring mode which copies command descirptors from the XmtRngs and RcvRngs to the 
command sub-structures - XCmdA, XCmdB, RCmdA and RCmdB. The commands are retneved from sequential entries of 
the host resident rings. A pointer to these entries is stored in the cached TCB in the substructure — RngCtri and is 
automatically incremented by the DMA Director upon completion of a command fetch Delivery to the CchBuf resident 
command sub-structure is ping-ponged, controlled by the CpuVars bits - XCmdOdd and RCmdOdd which are essentially 
images of XRngPtr[0] and RRngPtrlO] held in CxCCtI. These bits are used to form composite registers for use in DMA 
Director commands 

CpuVars 



RngCtri 



Bits Name Description 


31 


24XRngCnt 


Transmit Ring Command Count 


23 


16XRngPtr 


Transmit Ring Command Pointer 


15 


08 RRngCnt 


Receive Ring Command Count 


23 


16 RRng Ptr 


Receive Ring Command Pointer 



DRAM CONTROLLER (RcvDmi/Drm | XmtDrm/Drm) 

FIG 26 shows a DRAM Controller The dram controllers provide access to Dram (RcvDrm/Drm) and Dram (XmtDnTi/Dnn). 

RcvDrm primarily serves to buffer incoming packets and TCBs while XmtDrm pirmaTily buffers outgoing data and DmnQ 
; data. FIG. 25 shows the allocation of buffers residing in each of the drams. Both transmit and receive drams are 
I partitioned into data buffers for reception and transmission of packets. At initialization time, select buffer handles are 

eliminated and the reclaimed memory is instead dedicated to storage of DramQ and TCB data. 

! XDC supports checksum and crc generation while writing to XmtDmi XDC also provides a crc appending capability at 
j completion of write data copying. RDC supports checksum and crc generation while reading from RcvDmi and also 
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supports reading additional crc bytes, for testing purposes, which are not copied to the destination. Both contollers provide 
support for priming checi<sum and crc functions. 

The RDC and XDC modules operate using clocks with frequencies which are independent of the ^emaind^^^^^^^^^^ system, 
size of 16 cycles of 16B per cycle Double data rate (DDR) dram is utilized of the type RLDRAM. j 

dram blocl« E g . a starting dram address of 5 would limit XfrCnt to 128-5 or 123. 
The Dram Controller includes the following seven functional sub-modules 

PrsDstSqr - Parser to Dm Destination Sequencer monitors the RcvDmaQues and moves data to RcvDmi No response is I 
assembled | 
D2hSrcSqr - Drm to Host Source Sequencer accepts commands from the DmdDspSqr. moves data from RcvDrm to 
D2hDmaQ preceded by a destination header and followed by a status trailer 

D2qSrcSqr - Dim to GIbRam Source Sequencer accepts commands from the DmdDspSqr. moves data from RcvDmi to 
D2gDmaQ preceded by a destination header and foilowed by a status trailer 

H2dDstSqr - Host to Drm Destination Sequencer monitors the H2dDmaQue and moves data to XmtDim It then assembles 
and presents a response to the DmdRspSqr 

G2dDstSqr - GIbRam to Drm Destination Sequencer monitors the G2dDmaQue and moves data to XmtDmi. It then 
assembles and presents a response to the DmdRspSqr 

D2dCpySqr - Drm to Dim Copy Sequencer accepts commands from the DmdDspSqr. moves data from Dmi to Drm then 
assemble and presents a response to the DmdRspSqr 

XmtSrcSqr - Drm to Formatter Source Sequencer accepts commands from the XmtFmtSqr. moves data from XmtDm, to 
XmtDmaQ preceded by a destination header and followed by a status trailer 

DMA Director (DmaDir/Dmd) 

ThP DMA Director services DMA request on behalf of the CPU and the Queue Manager There are eleven distinct DMA 
Ihannl S thfcPU can uWize and two channels which the Queue Manager can use The CPU emf^s a 

blocks The abbreviated methods are not 

Channel DscMd TcbMd SglMd KbflVId PhdMd CrcAco Function 
Pxh - - Global Ram to XmtH Queue 

Pxi Global Ram to XmtL Queue 

D2g * - - * * Dram to Global Ram 
D2h * * Dram to Host 
D2d * - Dram to Dram 
G2d Global Ram to Dram 

G2h * * - Global Ram to Host 
H2d *---** Host to Dram 
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H2g ***- Host to Global Ram 

Figure 27 Is a block diagram depicting the functional units of the DMA Director. These units and their functions are: 
o GRmCtlSqr (Global Ram Control Sequencer). 

- Performs Global Ram reads and writes as requested. 
Q DmdDspSqr (DIVIA Director Dispatch Sequencer) 

- Monitors command queue write sequences and fetches queued entries from Global Ram. 

- Parses command queue entry. 

- Fetches DMA descriptors if indicated. 

- Fetches crc and checksum primers if indicated. 

- Fetches TGB SGL pointer if indicated. 

- Presents a compiled command to the DMA source sequencers. 
Q PxySrcSqr {Proxy Source Sequencer) 

- Monitors the Proxy Queue write sequence and fetches queued command entries from GRm. 

- Parses proxy commands. 

- Requests and moves data from GRmCtI to QMgr. 

- Extracts Proxy Buffer ID and presents to DmdRspSqr. 
Q G2?SrcSqr {G2d/G2h Source Sequencer) 

- Requests and accepts commands from DmdDspSqr 

- Loads destination header into DmaQue 

- Requests and moves data from GRmai into DmaQue. 

- Compiles sourx» status trailer and moves into the DmaQue. 
Q ?2gDstSqr (D2g /H2g Destination Sequencer) 

- Unloads and stores destination header from DmaQue. 

- Unloads data from DmaQue and presents to GRmCtI 

- Unloads source status trailer from DmaQue 

- Compiles DMA response and presents to DmdRspSqr 
Q DmdRspSqr (DMA Director Response Sequencer) 

- Accepts DMA response descriptor from DstSqr 

- Updates DmaDsc if indicated 

- Saves response to response queue if indicated. 

DMA commands utilize configuration information in order to precede with execution Global constants such as TCB length, 
SGL pointer offsets and so on are set up by the CPU at time zero. The configurable constants are. 

D CmdQBs - Command Queue Base, a EvtQBs - Event Queue Base. 

Q TcbBBs - TCB Buffer Base. 

Q DmaBBs - DMA Descriptor Base. 

Q HdrBBs - Header Buffer Base, a CchBBs - Cache Buffer Base. 
Q HdrBSz - Header Buffer Size, a CchBSz - Cache Buffer Size. 
Q SglPlx - SGL Pointer Index. 

o MemDsclx - Memory Descriptor Index, a MemDscSz - Memory Descriptor Size. 
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Q TRQIx - Receive Queue Index. 

Q PHdrix - Tcb ProtoHeader Index. 

O TcbHSz - Tcb ProtoHeader Size. 

Q HDmaSz - Header Dma Sizes A:D. 

Q TRQSz - Receive Queue Size, a TcbBBs - TCB Buffer Base. 

Figure X depicts the blocics involved in a DMA The processing steps of a descriptor mode DMA are 
Q CPU obtains use of a CPU Context identifier 

Q CPU selects a free descirptor buffer available for the current CPU Context identifier 

Q CPU assembles command variables in the descriptor buffer a CPU assembles command and deposits it in the 
DmdCmdQ. 

Q CPU may suspend the current context or continue processing in the current context 

D DmdDspSqr detects DmdCmdQ not empty, a DmdDspSqr fetches command queue entry from GRm 

Q DmdDspSqr uses command queue entry to fetch command descriptor from GRm a DmdDspSqr presents compiled 

command to DmaSrcSqr on DmaCmdDsc lines 

Q DmaSrcSqr accepts DmaCmdDsc. 

Q DmaSrcSqr deposits destination vaTrables (DmaHdr) into DmaQue along with control marker, a DmaSrcSqr presents 
read request and variables to source read controller. 

Q DrmCtlSqr detects read request and moves data from Drm to DmaSrcSqr along with status 

Q DmaSrcSqr moves data to DmaDmaQue and increments DmaSrcCnt for each word 

Q DmaSrcSqr deposits ending status (DmaTIr) in DmaDmaQue along with control marker. 

Q DmaDstSqr fetches DmaHdr and DMA data from DmaQue. 

Q DmaDstSqr request destination write controller to move data to destination 

Q DmaDstSqr fetches DmaTIr from DmaQue 

Q DmaDstSqr assembles response descriptor and presents to DmdRspSqr. a DmdRspSqr accepts response desctrptor. 
Q DmdRspSqr updates GRm resident DMA descriptor block if indicated 

- Indication is use of descriptor block mode 

Q DmdRspSqr assembles DMA response event and deposits in C'^?AtnQ. if indicated. 

- Indications are RspEn or occurance of DMA error a CPU removes entry from CtxEvtQ and parses it. 

FIG 28 is a DMA Flow Diagram The details of each step vary based on the DMA channel and command mode The 
, following sections outline events which occur for each of the DMA channels 

Proxy Command for Pxh and Pxl 

Proxy commands provide firmware an abbreviated method to specify an operation t° ^^P/ .^g fr^'^Jgf^^'S^^^^ 
Buffers (PxyBufs) on to transmit command queues. DMA vaiiables are rejreived and or ca^cutet^^^^^^ 
command fields in conjunction with configuration constants The command is assembled and deposited into me pro>qr 
command queue (PxhCmdQ or PxICmdQ) by the CPU Fomiat for the 32-bit, descriptor-mode, command -queue entry ,s. 

Bits Name Queue Word Description 

31 : 21 Bsvcl Zeroes . 

20 : 20 PxySz Copy count expressed as uni ts of 16-byte words . 0 == 16 words . 

19: 17 Rsvd Zeroes. 

16 : 00 PxyAd Address of Proxy Buf fer . 

FIG 29 is a Proxv Flow Diagram PxyBufs comprise a shared pool of GIbRam. PxyBuf pointers are memory address 
pointere " - ^ fj.* -.3 Jd^ Aw=.ii=hi» (ir^^ PxvBufs are each reoresented by an entry in the Proxy Buffer 
Queue (I 



29 is a Proxv Flow Diagram. PxyBufs comprise a snarea pooi or i^iDram. rAyou. nuna^.o 
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The format of the 32-bit Proxy Buffer descriptor is: 
Bits Name Queue Word Descript ion 
31 : 16 Rsvd Zeroes. 

1 6 : 00 PxyAd Address of Proxy Buf fer . PxyAd [ 2 : 0] are zeroes . 
TCB Mode DMA Command for G2h and H2g 

DMTimmInd (SS S^Sn lli9u-*n conkin.^ TM a™ sto is d<,»nrtn«l by th. oortgu-um 

constants: 
-G2hTGbSz 
- H2gTcbS z 

The fonnat ofthe 32-bit, TCB-mode, command-queue entry is: 
Bits Name Description 

31 :31 RspEn Response Enable causes an entry to be written to one of the 32 response queues (C??EvtQ) following 
temiination of a DMA operation. 

30.29 CmdMd Command Mode must be set to 3. Specifies this entry is a TcbMd command. 

23:24 CpuCx Indicates the context of the CPU which originated this command. CpuCx also specifies a response queue for 

DMA responses. 

23:21 DraaCx DMA Context is ignored by hardware. 
20:19 DraaTg DMA Tag is ignored by hardware. 
18:12 Cbfid Specifies a GRm resident Cache Buffer. 
1 1 :00 Tbfld Specifies host resident TCB Buffer. 

Variable Tbfld and configuration constants CchBSz and CchBBs are used to calculate GIbRam as well as HstMem 
addresses for the copy operation They are formulated as follows 
GRmAd = CchBBs + (CbfldXchBSz) ; HstAd = TcbBBs + (Tbf rd*2K) ; 
Command Ring Mode DMA Command for H2g 

Cache Buffers (Cbfs). DMA variables are retreived and/or calculated using the DMA command 

fiPlds in coniunction with configuration constants Transmit irng command pointer (XRngPrr) and receive irng command 

JofnterSS^arr^^^^^ 

Count (XRngCnt) and receive ring count (RRngCnt) The dma size ,s Fixed at 32. 
The format of the 32-bit, Ring-mode, command-queue entry is. Bits Name Description 

31:31 RspEn Response Enable causes an entry to be written to one of the 32 response queues (C°°EVtQ) following 

termination of a DMA operation 

30:29 CmdMd Command Mode must be set to 2 Specifies this entry is a RngMd command. 
28:24 CpuCx Indicates the context of the CPU which originated this command. 
CpuCx also specifies a response queue for DMA responses. 
23:21 DmaCx DMA Context is ignored by hardware 

20 20 OddSq Selects the odd or even command buffer of the TCB cache as the destination of the command descriptor. 
This bit can be taken from XRngPtr(0) or RRngPtr [O) . 

19- 19 XmtMd When set, indicates that the transfer is from the host transmit command ring to the CchBuf. When reset, 
indicates that the transfer is from the host receive command ring to the CchBuf. 
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18-12 Cbfid Specifies a GRm resident Cache Buffer. 1 1 .00 Tbfid Specifies iiost resident TCB Bjffer. 

Variables TbfId and Cbfld and configuration constants XRngBs, RRngBs, XRngSz, RRngSz, XmtCmdIx, CchBSz and 
CchBBs are used to calculate GIbRam as well as HstMem addresses for the copy operation. They are fonnulated as 
follows for transmit command Trng transfers: 

GRmAd = CchBBs + (Cbfld * CchBSz) + XmtCmdIx + 32, HstAd = XRngBs +( (TbfId « XRngSz) + XRngRr) * 32); 
They are fonnulated as follows for receive command ring transfers' 

GRmAd = CchBBs + (Cbfld * CchBSz) + RcvCmdIx + 32; HstAd = XRngBs +( (TbfId « RRngSz) + RRngPtr) * 32); 
SGL Mode DMA Command for H2g 

SGL Mode (SglMd) commands provide firmware an abbreviated method to specify an operation to copy SGL entires from 
the host resident SGL to the GRm resident TCB. DMA variables are retreived and/or calculated using the DMA command 
fields in conjunction with configuration constants and TCB resident vaiiables. Either a transmit or receive SGL may be 
specified via the CmdMd[0] This command is assembled and deposited into the H2g Dispatch Queue by the CPU The 
fonnat of the 32-bit, descriptor-mode, command-queue entry is 

Bits Name Description 

31-31 RspEn Response Enable causes an entry to be written to one of the 32 response queues 
(C?°evtQ) following termination of a DMA operation. 

30:29 CmdMd Command Mode==l specifies that an SGL entry is to be fetched. 

28.24 CpuCx CPU Context indicates the context of the CPU which originated this command. CpuCK specifies a response 

queue for DMA responses. 

23:21 DmaCx DMA Context is ignored by hardware 

20:20 OddSq Selects the odd or even command buffer of the TCB cache as the source of the SGL pointer. Selects the 
opposite command buffer as the destination of the memory descriptor. This bit can be taken from XRngPtr[0] or RRngPtr 
(O). 

19:19 XmtMd When set, indicates that the transfer should use the transmit command buffers of the TCB cache buffer as 
the source of the SGL pointer and the destination of the memory descriptor. When reset, indicates that the transfer should 
use the receive command buffers of the TCB cache buffer as the source of the SGL pointer and the destination of the 
memory descriptor. 

18:12 Cbfld Specifies the Cache Buffer to which the SGL entry will be transferred. 1 1 :00 Ravd Ignored. 

CmdMd and Cbfld are used along with configuration constants CchBSz, CchBBs, SglPix and MemDsclx to calculate 
addresses. The 64-bit SGL pointer, which resides in a Cache Buffer, is fetched using an address formulated as: 

GRmAd = CchBBs + (Cbf T.dXchBSz ) + SglPlx + (IxSel * MemDscSz); 

The retreived SGL pointer is then used to fetch a 12-byte memory descriptor from host memory which is in turn written to 
the Cache Buffer at an address fonnulated as: 

GRmAd = CchBBs + (Cbfid*CchBSz) + MemDsclx + (IxSel * 16) ; 

The SGL pointer is then incremented by the configuration constant SGLIncSz then written back to the CchBuf. 
Event Mode DMA Command for G2h 

Event Mode (EvtMd) commands provide finnware an abbreviated method to specify an operation to copy an event 
descriptor between GRm and HstMem. DMA variables are retreived and/or calculated using the DMA command fields in 
conjunction with configuration constants. The DMA size is fixed at 16 bytes. Data are copied from an event descriptor 

buffer determined by {DmaCx.CpuCx} . 

The format of the 32-bit, Event-mode, command-queue entry is: 
Bits Name Description 

31 :31 RspEiT Response Enable causes an entry to be written to one of the 32 response queues (C??EvtQ) following 
termination of a DMA operation. 

30:29 CmdMd Command Mode must be set to 2. Specifies this entry is a EvtMd command. 28:24 CpuCx Indicates the 
context of the CPU which originated this command. CpuCx also specifies a response queue for DMA responses. 

23:21 DmaCx DMA context specifies the DMA descriptor block in which the event descriptor (EvtDsc) resides. 

20:19 QmaT? DMA Tag is ignored by hardware. 
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17:1 5 NEQId Specifies a host resident NIC event queue. 

14:00 NEQSq NIC event queue write sequence specifies which entry to write. 

Command variables NEQld and NEQSq and configuration constants DmaBBs NEQfz and NEQBs are used to calculate 
the HstMem and GIbRam addresses for the copy operation. They are formulated as follows. 

GRmAd = DmaBBs + {CpuCx, DmaCx, 5'bOOOOO}; 

HstAd = NEQBS + { (NEQId*NicQSz) + NEQSq, 5'bOOOOO); 

Prototype Header Mode DMA Command for H2d 

Prototvoe Header Mode (PhdMd) commands provide firmware an abbreviated method to specify an operation to copy 

?Se head^ to DrS B^^^^ host resident TCB Buffets (TbO- DMA variables are ^^t^'ved an^^^^^^^ 

SSg thi D^ command fields in conjunction with configuration constants Th.s command .s assembled and deposrted 

into a dispatch queue by the CPU. CmdMdfO] selects the dma size as follows: 

- H2dHdrSz [ CmdMd [ 0 ] ] The format of the 32-bit, protoheader-mode, command-queue entry is: 

Bits Name Description 

31 :31 RspEri Response Enable causes an entry Co be written to one of the 32 response queues 
(C??EvtQ) following termination of a DMA operation. 

30:29 CmdMd Command Mode must be set to 2 or 3. It specifies this entry is a HdriVId command. 

28:24 CpuCx CPU Context indicates the context of the CPU which originated this command. 

CpuCx specifies a response queue for DMA responses and is also used to specify a GIbRam-resident Header Buffer. 

23:12 Xbfid Specifies a DRAW Transmit Buffer. 1 1 :00 Tbfld Specifies host resident TCB Buffer. 

Configuration constants PHdrIx and CmpBBs are used to calculate the host address for the copy operation. The 

addresses are formulated as follows: 

HstAd = ( Tbf Id* IK) + CmpBBs ; DrmAd = Xbf id* 256 ; 

This command does not include a DmaCx or DmaTg field. Any resulting response will have the DmaCx and DmaTg fields 
setto5"bl101 1. 

Prototype Header Mode DMA Command for G2d 

Prototvoe Header Mode (PhdMd) commands provide firmware an abbreviated method to specify 

p offie header^To DRAM Buffers from GRm resident TCB Buffers (TbO- DMA vanables are '^1!^"^^.^^^°'^^^^ 

Ssing thi DMA command fields in conjunction with configuration constants. This command ,s assembled and deposited 

into a dispatch queue by the CPU. CmdlVld[l ] selects the dma size as follows: 

- G2dHdrS z [ CmdMd [ 0 ] ] The format of the 32-bit, protoheader-mode, command-queue entry is: 

Bits Name Description 

31 :31 RspEn Response Enable causes an entry to be written to one of the 32 response queues 
(C??EvtQ) following termination of a DMA operation. 

30:29 CmdMd Command Mode must be set to 2 or 3. It specifies this entry is a HdrMd command. 

28:24 CpuCx CPU Context indicates the context of the CPU which originated this command. 

CpuCx specifies a response queue for DMA responses and is also used to specify a GIbRam-resident Header Buffer. 

23 : 21 DmaCx DMA Context is ignored by hardware . 

20 : 19 DmaTg DMA Tag is ignored by hardware . 

18 : 12 Cbf Id Specif ies a GFIm resident Cache Buffer . 

1 1 : 00 Xbfld Specif ies a DRAM Transmit Bu f fer . 

Configuration constants CchBSz and CchBBs are used to calculate GIbRam and dram addresses for the copy operation. 
I They are fomriulated as follows: 

I GRmAd = <Cbfld*CchBSz) + CchBBs + PHdrix; DmiAd = Xbfld*256; 
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Header Buffer Mode DMA Command for D2g 

Header Buffer Mode (HbfMd) commands provide firmware an abbreviated method to specify an operation to copy headers 
froS^^F^S BuS tS Header Buffers (Hbf). DMA variables are ^e^^'^.f 

commandfleids in conjunction with configuration constants. This command is assembled and deposited into a dispatch 
queue by the CPU. 

The format of the 32-bit, header-mode, command-queue entry is: 
Bits Name Description 

31 .31 RspEn Response Enable causes an entry to be written to one of the 32 response queues 
<C?EvtQ) following temriinatlon of a DMA operation. 

30-29 CmdMd Command Mode must be set to 1 . It specifies this entry is a HdrMd command. 28:24 CpuCx CPU Conte)rt 
indfratS^Sie cortext of the CPU which originated this command. CpuCx specifies a response queue for DMA responses 
and is also used to specify a GIbRam-resident Header Buffer. 

23:21 DmaCx DMA Context is ignored by hardware. 20:19 DmaTg DMA Tag is ignored by hardware. 18:17 DraaCd 
DmaCd selects the dma size as follows: 

D2?HdrSz[DmaCd) 

16:16 Hbfid Used in conjunction wrth CpuCx to specify a Header Buffer. 15:00 Rbfid Specifies a DRAM Receive Buffer for 
the D2g channel. 

Configuration constants HdrBSz and HdrBBs are used to calculate GIbRam and dram addresses for the copy operation. 
They are formulated as follows: 

GRmAd = HdrBBs + ( (CpuCx, HbfId) "HdrBSz) ; DrmAd = Rbfid * 32; 
Descriptor Mode DMA Command for D2h, D2g, D2d. H2d. H2g, G2d and G2h 

Descriptor Mode (DscMd) commands allow firmware greater flexibility in defining copy operations through the inclusion of 

adSal variables assembled within a GIbRam-resident DMA Descriptor Block (DmaDsc). This 

command is assembled and deposited into a DMA dispatch queue by the CPU. The format of the 32-bit, descriptor-mode, 

command-queue entry is' 

Bits Name Descript ion 

31 : 31 RspEn Response Enable causes an entry to be written to one of the 32 response queues (Q-^EVtQ) following 
termination of a DMA operation. 

30 : 29 CmdNd Command Mode must be set to 0. It specifies this entry is a DscMd command. 
28 ■ 24 CduCx CPU Context indicates the context of the CPU which originated this command. Th is field, in roryunction 
wUh DmaCx. is used to create a GIbRam address for the retrieval of a DMA descriptor block. CpuCx also specifies a 
response queue for 

DMA Responses and specifies a crc/checksum accumulator to be used for the crc/checksum accumulate option. 
9-^-91 nmaCx DMA Context is used along with CpuCx to retreive a DMA descriptor block. 20 : 1 9 DmaTg DMA Tag is 
fgnSeJ^^Sr^^^^^^^ 

cSecksum values fetched from GIbRam at location ChkAd. This option is valid only when ChkAd !- 0. 
16 03 ChkAd Check Address specifies GRmAd[16:03) for fetch/store of ore and checksum values . ChkAd == 0 indicates 
iaffhfa^SrSSfunS sh?,uld start with a checksum value of 0 and that the accumulated checksum value should be 
stored in the 

DMA descriptor block only, that crc functions must be disabled and that the 

CrcAccs must not be altered. If ChkAd == 0 then AccLd and TstSz/AppSz are ignored. The accumulator functions are 
valid for D2h, D2g, H2d and G2d channels only. 

02 00 TstSz This option is valid for D2h and D2g channels only Causes TstSz bytes of source data to f e read and 
accumu^ted but not copied to the destination. A maximum value of 7 allows a four byte crc and up to three bytes of 

padding to be tested. 

02:00 AppSz This option is valid for H2d and G2d channels only Causes AppSz bytes of the 

rrcAcc and zeroes to be appended to the end of data being copied. This option is valid only when ChkAd != 0. An append 
?i£^f one tS bSes r^^^^ the same number of bytes of crc being sent to the checksum accumulator and wntten to 
ml deSion AnT^^ size greater than Eour byce results in the appending of the crc plus zeroes. 
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AppSz Appended 

0 {Null) 

1 (CrcAcc[31:24]) 
2{CrcAcc[31:16}> 
3<CrcAcc[31:08)) 
4<CrcAcc[31:00]} 

5 (08'bO,CrcAcc[31 :0] > 
6|16"bO,CrcAcc[31 :0]> 
7|24'bO,CrcAcc[31.0]> 
DMA Descriptor 

The DMA DescTTptor (DmaDsc) is an extension utilized, by DscMd DMA commands, to allow added specification of DMA 
Iariab^?h«dhastheU^ 

non-locked queue access method. The DmaDsc variables are assembled in. GIbRam resident, DmaDsc Buffers 
fomaDsS Each CpSias, preallocated. GIbRam memory which accomodates eight DmaDscs per CpuCx for a total of 
256 DmaDscs. The DmaDscs are accessed using a GIbRam starling address formulated as: 

GRmAd = DrraBBs + { (CpuCx , DmaCx ) * 1 6) 

nman«r<! arP fetched bv the DmdDspSqr and used. In conjunction with DmaCmds, to assemble a descriptor for 
p;rsentSion to^^^^^^^ Dma6scs are also updated, upon DMA temr,ination. with ending 

status comprising variables which reflect the values of address and length counters. 

WorxJ Bits Name Description 

03 31 • 00 Hs tAdH Host Address High provides the address bits [63:32] used by the BlU. This fleld is updated at transfer 

ferrJJnaSonlf eVmer RspEn is set or an error occured. HstAdH Is valid for D2h. G2h, H2d and H2g channels only. 

02 31 : 00 Ms tAdL Host Address Low provides the address bits [31 :00] used by the BlU. This field is updated at transfer 

termination. HstAdL is valid for D2h, G2h, 

H2d and H2q channels only. 

27 : 00 DrmAdr Dram Ram Address is used as a source address for D2d. Updated ^Me^^^^ 

Global Ram Address is used as a destination address for D2g and as a source address for G2d DMAs. Updated at 

transfer termination. 

01 31:31 Rsvd Reserved. 

30:30 RlxDbl Relax Disable clears the relaxed-ordering-bit in the host bus attributes. 
It is valid for D2h, G2h, H2d and H2g channels only. 

29-29 SnpDbl Snoop Disable Sets the no-snoop-bit in the host bus attributes. It is valid for D2h, G2h, H2d and H2g 
channels only. 

9R ?R PadEnb Pad Enable causes data copies to RcvDrm or XmtDrm, which do not temiinate on an eight byte boundary, 
to be paddfd trS^^^^^^^ the eight byte boundary. This has the effect of Inhibiting read-before- write cycles, 
thereby improving perfomnance. 

27 00 DrmAdr Dram Address provides ttie dram address for RcvDrm and XmtDmi. This field is updated at transfer 
termination. DrmAd is valid for D2h, D2g, H2d and G2d channels only. 

16-00 GLitAd Global Rara Address is used as a destination address for H2g and as a source address for G2h DMAs. This 
! field is updated at transfer temiination. 
! 00 31 :26 Rsvd Reserved. 

25 23 Funcid Specifies the PCIe function ID for transfers and interrupts 22:22 IntCyc Used by the G2h channel to indicate 
that an Interrupt set or clear should be perfomied upon completion of the transfer operation. 
2121 IntCir 1 • Interrupt clear. 0: Interrupt set. For legacy intermpts. 20:16 IntVec Specifies the interrupt vector for 
meSaqrs gnaled interrupts. 15:00 XfrLen Transfer Length specifies the quantity of data bytes to transfer. A length of zero 
mdfcates tha^no dateshould be transferred. Functions as storage for the checksum accumulated during an error free 
IS and Is uVditefa° tran^^ termination. If a transfer error is detected, this field will instead contain the residual 
I transfer length. 
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DMA Event for D2h, D2g, D2d, H2d, H2g, G2d and G2h 

DMA Event (DmaEvt) is a 32-bit entry which Is deposited Into one of the 32 Context Dma Event Queues (C?EvtQ). upon 
teSnafcn of a DMA operation, if RspEn Is set or If an error condition was encountered. The event is used to resume 
procSg by a CPU Context and to relay DMA status The format of the 32-blt event descnptor is as follows: 

Bit s Name Descr Ipt Ion 

31 : 31 RspEn Copied from dispatch queue entry . 
30 : 29 CmdMd Copied from dispa tch queue en t ry . 
28 : 24 CpuCx Copied from dispat ch queue en t ry . 

23 - 21 DmaCx Copied from dispa tch queue en t ry Forced to 3 ' bill for H2d PhdMd . 
20 : 1 9 DmaTg Copied from dispatch queue en t ry Forced to 2 " bll for H2d PhdMd . 
18.15 DmaCh Indicates the responding DMA channel . 
14 . 05 Rsvd Resen/ed . 

04 : 04 RdEn- Set for sou rce en-ors . Cleared for des t Inat ion er rors . 
03 . 00 ErrCd Error code 0 - No error . 

A response is forced rogartless of the state of (he RspEn bit anytime an error is detected. Next U^^ S^sSotoHs uJda^ 
register will be set Dma option Is not updated for commands which encounter an error, but the dma descnptor is updated 
to reflect the residual transfer count at time of en-or. 

ETHERNET MAC AND PHY 

FIG. 30 shows a Ten-Gigabit Receive Mac In Situ. 

FIG 31 shows a Transmit/Receive Mac Queue Implementation. 

RECEIVE SEQUENCER (RcvSqr/RSq) 

The Receive Sequencer is depicted In FIG. 32 in situ along with connecting modules. RcvSqr functional sub- ™dules 
MuS rS Parser (R^PrsSqr) and the Socket Detector (SktDetSqr) The RcvPrsSqr parses frames, DMAs them 
0 and^asses socket information on to the SktDetSqr The SktDetSqr compares the parse information with socket 

desS^ from SktDscRam, compiles an event descnptor and pushes it on to the RSqEvtQ. Two modes of operation 
provide support for either a single ten-glgablt mac or for four one- gigabit macs 

The receive process steps are: 

Q RcvPrsSqr pops a Rbfld off of the RcvBulQ. a R^^P^q^wftsforHOB of data or PWRdy from^ 
pushes RcvDrmAd onto PrsHdrQ. a RcvPrsSqr parses frame headers and moves to RcvDmaQ a RcvPrsSqr moves 
residual of 1 1 0B of data from RcvMacQ to PrsHdrQ. 

□ RcvPrsSqr pushes RcvDrmAd +■ 12S onto PrsDatQ 

Q RcvPrsSqr moves resudual frame data from RcvMacQ to PrsDatQ and releases to RcvDstSqr 
D RcvDstSqr pops RcvDrmAd + 128 off of RcvDatQ then pops data and copies to RcvDmri 
Q RcvPrsSqr prepends parse header to frame header on PrsHdrQ and releases to RcvDstSqr 
Q RcvDstSqr pops RcvDrmAd off of RcvDatQ then pops header + data and copies to RcvDmi 
Q RcvPrsSqr assembles and pushes PrsEvtDsc onto PrsEvtQ 

Q SktDetSqr pops PrsEvtDsc off of PrsEvtQ a SktDetSqr uses Toeplitz hash to select SktDscGrp In SktDscRam. 
Q SktDetSqr compares PrsEvtDsc with SktDscGrp entries (SktDscs). a SktDetSqr assembles RSqEvtDsc based on 
results and pushes onto RSqEvtQ 

Q CPU pops RSqEvtDsc off of RSqEvtQ a CPU perfomis much magic here 
Q CPU pushes Rbfld onto RcvBufQ 
Receive Configuration Register (RcvCfgR) 
Bits Name Description 

031:031 Reset Force reset asserted to the receive sequencer. 
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030:030 DetEn Socket detection enable. 

029:029 RcvFsh Force the receive sequencer to flush prefetched RcvBufs. 
028:028 RcvEnb Allow parsing of receive packets. 

027:027 RcvAII Allow forwarding of all packets regardless of destination address. 
026:026 RcvBad Allow forwarding of packets for which a link error was detected. 
025:025 RcvCtI Allow forwarding of 802.3X control packets. 
024:024 CmdEnb Allow execution of 802.3X control packet commands; e.g. pause. 
023:023 AdrEnH Allow fonwarding of packets with the Macfld — RcvAddrH. 

022:022 AdrEnG Allow fonwarding of packets with the MacAd RcvAddrG. 

021 :021 AdrEnF Allow forwarding of packets with the MacAd == RcvAddrF. 
020:020 AdrEnE Allow fonwarding of packets with the MacAd == RcvAddrE. 
019:019 AdrEnO Allow forwarding of packets with the MacAd == RcvAddrD. 
018:018 AdrEnC Allow forwarding of packets with the MacAd == RcvAddrC. 
01 7:01 7 AdrEnB Allow forwarding of packets with the MacAd == RcvAddrB. 
01 6:016 AdrEnA Allow forwarding of packets with the MacAd == RcvAddrA. 
015-015 TzIpVS Include tcp port during Toeplitz hashing of TcplpV5 frames. 
014:014 TzlpV4 Include tcp port during Toeplitz hashing of TcplpV4 frames. 
013:000 Rsvd Reserved. 
Multicast-Hash Filter Register (FilterR) 

Bits Name Description 127:000 Filter Hash bucket enable for multicast filtering. 
Link Address Registers H:A (LnkAdrR) 

Bits Name Description 047:000 LnkAdr Link receive address. One register for each of the 8 link addresses. 

Toeplitz Key Register (TpzKeyR) 

Bits Name Description 

319:000 TpzKey Teoplizt-hash key register. 

Dectect Configuration Register (DetCfgR) 

Bits Name Description 

031:031 Reset Force reset asserted to the socket detector. 030:030 DecEn 029:000 Rsvd Zeroes. 
Receive Buffer Queue (RcvBufQ) 
Bits Name Description 

031:016 Rsvd Reserved. 015:000 Rbfid Drm Buffer id. 
Receive Mac Queue (RcvMacQ) if (Type == Data) ( 
Bits Name Description 

035:035 OddParOdd parity. 034:034 WrdTyp 0-Data. 

033:032 WrdSz 0-4 bytes. 1-3 bytes. 2-2 bytes. 3-1 bytes. 031:000 RcvDat Receive data. 
) else { Bits Name Description 
035:035 OddPar Odd parity. 
034:034 WrdTyp 1 -Status. 
033:029 Rsvd Zeroes. 
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028:023 LnkHSh Mac Crc hash bits. 

022.022 SvdDet Previous carrier defected. 

021 .021 LngBvt Long event detected. 

020.020 PEarly Receive frame missed. 

019.019 DEarly Receive mac queue overrun. 

018. Oie FcsEn- Crc-en-or detected. 

01T: 017 SymOdd Dribble-nibble detected. 

016:016 SymEn- Code-violation detected. 

01 5:000 RcvLen Receive frame size (includes crc) 

1 

Parse Event Queue (PreEvtQ) 
Bits Name Description 

315 188 SrcAdr Ip Source Address. lpV4 address is left justified. 

187 060 DstAdr Ip Destination Address. lpV4 address is left justified. 

059 044 SrcPrt Tcp Source Port. 

043 029 DstPrt Tcp Destination Port. 

027 020 SktHsh Socket Hash. 

019 019 NetVer 1 = IPV6. 0 = IPV4. 

018 018 RcvAtn Detect Disable = RcvSta [RcvAtn] . 

017 016 PktPri Packet priority. 

015 000 Rbfid Receive Packet Id. (Dram packet buffer id.) 

Receive Buffer 

Bytes Name Description 

???:018 RcvDat Receive frame data begins here. 

017 :016Rsvd Zeroes. 

015:012 TpzHsh Toeplitz hash. 

01 1:01 1 Netlx Network header begins at offeet Netlx. 

010:010 Tptlx Transport header begins at offset Tptlx. 

009:009 SktHsh Socket hash (Calc TBD) . 

008:008 LnkHsh Unk address hash (Crcl6[5:0 ] ) . 

007:006 TptChk Transport checksum. 

005:004 RcvLen Receive frame byte count (Includes crc) . 

003: 300 RcvSta Receive parse status . 

Bits Name Description 031:031 RcvAtn Indicates that any of the following occured: 

A link enor was detected. 

An Ip enror was detected. 

A tcp or udp error was detected. 

A link address match was not detected. 

Ip version was not 4 and was not 6. 
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Ip fragmented and offset not zero. 

An Ip multicast/broadcast address was detected. 

030:025 TptEta Transport status field. 

6'blx_xxxx Transport error detected. 

6"bI0_001 1 Transport checksum error. 

6'blO_0010 Transport underflow enror. 

6'bl0_0001 Reserved . 

6'blO_0000 Transport header length en-or. 

6'b9x_xxxx — No transport error detected. 

6'bei_xxxx Transport flags detected. 

6'b9x Ixxx = Transport options detected. 

Receive Statistics Reg (RStatsR) 

Bits Name Description 

31:31 Type 0 - Receive vector. 

30:27 Rsvd Zeroes. 

26:26 802.3 Packet fomiat was 802.3 

2S:25 BCast Broadcast address detected. 

24 :24 MCasE Multicast address detected. 

23:23 SvdDet Previous carrier detected. 

22:22 LngEvt Long event detected. 

21:21 PEarly Receive frame missed. 

20.20 DEarty Receive mac queue overrun. 

19:19 FcsErr Crc-error detected. 

18:18 SymOdd Dribble-nibble detected. 

17:17 SymEnr Code-violation detected. 

1 6:1 6 RcvAtn Copy of RcvSta (RcvAtn) . 

15:00 RcvLen Receive frame size (includes crc) 

Socket Descriptor Buffers (SDscBfs) 2K Pairs x 295b = 75520B 

Buffer Word Fomiat - 1PV6: Bits Name Description 

294:292 Rsvd Must be zero. 

291.290 DmaCd DMA size indicator. 0-16B. 1-96B. 2-128B, 3-192B. 

289:289 DetEn 1. 

288:288 IpVer l-lpV6. 

287:160 SrcAdr lpV6 Source Address. 

159-032 DstAdr lpV6 Destination Address. 

031-016 SrcPrtTcp Source Port. 

015-000 DstPrtTcp Destination Port. 

Buffer Word Format - IPV4 Pair- Bits Name Description 

294 293 DmaCd Odd dscr DMA size indicator. 0-16B, 1-96B. 2-128B. 3.192B. 
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292 292 DetEn Odd dscr enable 

291 290 DmaCd Even DMA size indicator. 0-16B, 1-96B, 2-128B, 3-1928. 

299 289 DetEn Even dscr enable. 

298 289ipVer0-lpV4. 

287 192 Rsvd Reserved. 

191 160 SrcAdr Odd IpVI Source Address. 

1 59 128 DstAdr Odd lpV4 Destination Address. 

127 112 SrcPrt Odd Tcp Source Port. 

1 11 096 DstPrt Odd Tcp Destination Port. 

095 064 SrcAdr Even lpV4 Source Address. 

063 032 DstAdr Even IpV4 Destination Address. 

031 016 SrcPrt Even Tcp Source Port. 

015 000 DstPrt Even Tcp Destination Port. 

Detect Command (DetCmdQ) ??? Entries x 32b 

Descriptor Disable Fomiat : Bits Name Description 

31:30 CmdCd 0-DetDbl . 29:12 Rsvd Zeroes .11:00 fcbld TCB identifier. 

IPV6 Descriptor Load Fonnat. 

Bits Name Description 

WORD 0 

031:030 CmdCd 1-DscLd 
029:029 DetEn 1. 
028:028 IpVer l-lpV6. 
027:01 1 Rsvd Don't Care. 

013:012 DmaCd DMA size indicator. 0-16B, 1-96B. 2-128B. 3-192B. 011:000 Tcbid TCB identifier. WORD 1 

031 01 6 SrcPrt TCP Source Port. 015:000 DstPrt Tcp Destination Port. WORDS 5:2 127:000 DstAdr Ip Destination 
Address. WORDS 9:6 127:000 SrcAdr Ip Source Address. 

iPV4 Descriptor Load Fonnat: Bits Name Description 

WORD 0 

030 CmdCd 1-DscLd 029 029 DetEn 1. 028 028 lpVer0-lpV4. 027 :014 Rsvd Don ' t Care. 013:012 DmaCd DMA 
Size ind°cS 0-15B Sb 2-128B 3-192B. 011- 000 TcbId TCB identifier WORD 1 031.016 SrcPrt Tcp Source Port. 
oTI:000 DscpS -f^p beStron Port. WORD 2 031:000 DscAdr Ip Destination Address. WORD 3 031 :000 SrcAdr ip 
Source Address. 

Descriptor Read Fonmat: Bits Name Description 

31 :30 CmdCd 2-DscFd. 

29:16 Rsvd Zeroes. 

15:11 Wrdix Descriptor word select. 

10:00 WrdAd TCB identifier. 

Event Push Fonnnat: Bits Name Description 

31:30 CmdCd 3-DscRd 29:00 Event Rev Event Descriptor. 

RcvSqr Event Queue (RSqEvtQ) ??? Entries x 32b 
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Rev Event Format: 
Bits Name Description 
31:31 EvtCd 0:RSqEvt 

30:29 DmaCd If EvtCd==RSqEvt DMA size indicator. 0-16B, 1-968, 2-128B. 3-192B. 

28:28 SkRcv TCB identifier valid. 

27:16 Tcbid TCB identifier. 

15:00 RbfldDnnn Buffer id. 

Cmd Event Fonnat: 

Bits Name Description 

31:31 EvtCdl :CmdEvt 

30:29 RspCd If EvtCd==CmdEvt . Cmd response code. 0-Rsvd, 1-DscRd, 2-EnbEvt, 3- blEvt. 

29.28 SkRcv TCB identifier valid. 

27: 1 6 TcbId TCB identifier. 

15:00 DscDat Requested SktDsc data. 

TRANSMIT SEQUENCER (XmtSqr/XSq) 

The Transmit Sequencer Is depicted in FIG. 33 in situ along with connecting modules XmtSqr cp^^Pjyf 
unctional modules; XmtCmdSqr and XmtPmtSqr. XmtCmdSqr fetches, parses and dispatches '^'""^^"^VnH .^Si 
sub module. XmtS cSqr. XmtPmtSqr receives commands and data from the XmtDmaQ, parses the con^T "f ' J^^^^Jt 
frame and pushes it on to one of the XmtMacQs. Two modes of operation provide support for either a single ten-gigabit 
mac or for four one-gigabit macs 

Transmit Packet Buffer (XmtPktBuf) 2KB, 4KB, 8KB or 1 6KB 

Bytes Name Description 

EOB:000 XmtPay - Transmit packet payioad. 

Transmit Configuration Register (XmtCfgR) 

Bits Mame Description 

31 :31 Reset Force reset asserted to the transmit sequencer. 
30:30 XmtEnb Allow formatting of transmit packets. 
29:29 PseEnb Allow generation of 802.3X control packets. 
28:16 PseCiTt Pause value to insert in a control packet. 
15:00 Ipid Ip flow ID initial value. 
Transmit Vector Reg (RStatsR) 
Bits Name Description 

31-28 Rsvd A copy of transmit-buffer descriptor-bits 31 :28. 

27:27 XmtDn Transmission of the packet was completed. 

26:26 DAbort The packet was defen-ed in excess of 24.287 bit times. 

25:25 Defer The packet was deferred at least once, and fewer than the limit. 

24:24 CAbort Packet was aborted after CCount exceeded 15. 

23:20 CCount Number of collisions incun-ed during transmission attempts. 

19:19 CLate Collision occun-ed beyond the normal collision window {64B). 

18:18 DLate XSq failed to provide timely data. 
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17:17 CtlPkt Packet was o£ the 802.3X control format. LnkTyp == 0x8808 

16:16 BCast Packet's destination address was broadcast address. 

1 5: 1 5 IVICast Packet's destination address was multicast address. 

14:14 ECCErr ECC enror detected during dram DIWA 

13:00 XmtLen Total bytes transmitted on the wire. 0 = 16KB. 

Transmit IVIac Queue (XmtMacQ) if (Type == Data) ( Bits Name Description 

35:35 OddPar Odd parity 

34:34 WrdTyp 0-Data. 

33:32 WrdSz 0-4 bytes. 1-3 bytes. 2-2 bytes. 3-1 bytes. 
31:00 X-ntDat Data to transmit. 
) else ( 

Bits Name Description 

35 35 OddPar Odd parity. 34 34 WrdTyp 1-Status . 33 18 Rsvd Zeroes. 17 17 CtlPkt Packet was of the 802.3X control 
format. LnkTyp == 0x8808 16 16 BCast Packet's destination address was broadcast address. 15 15 l\/1Cast Packet's 
destination address was multicast address. 1414 BCCEn- ECC error detected during dram DMA. 

1 3:00 XmtLen Total byces to be transmitted on the wire. 0 = 1 6KB. 

) 

Transmit High-Priority/Normal-Priority Queue (XmtUrgQ/XmtNmlQ) 
Raw Send Descriptor' 

Word Bits Name Command Block Description 

00 31.30 CmdCd 0 RawPkt 29-16 XmtLen Total frame length 0 — 16KB. 15-00 XmtBuf Transmit Buffer id. 

01 31:00 Rsvd Don't care. 
03 31:00 Rsvd Don't care. 
Checksum Insert Descriptor 

Word Bits Name Command Block Description 

00 31 30 CmdCd 1 Chklns 29 1 6 XmtLen Total frame length. 0 == 1 6KB. 1 5 00 XmtBuf Transmit Buffer id. 

01 31 i5 ChkDat Checksum insertion data. 15 08 Rsvd Zeroes 07 00 ChkAd Checksum insertion pointer expressed in 2B 
words. 

02 31 00 Rsvd Don't care. 

03 31 00 Rsvd Don't care. 

Format Descriptor Word Bits Name Command Block Description 

00 31 30 CmdCd 2: Format 

29 15 XmtLen Total frame length. 0 == 16KB. 
15 00 XmtBuf Transmit Buffer id. 

01 31 31 TiitiEnb Tcp timastamp option enable. 
30- 30 TcpPsh Sets the tcp push fLag. 

29: 29 TcpFin Sets the tcp finish flag. 
28:28ipverO:lpV4, l:lpV6. 
27: 27 LnkVIn Vlan header fonnat . 
26.26 LnkSnp 802.3 Snap header fomiat 
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25.25 PurAck Pure ack mode. XmtBuf is invalid and should not be recycled. 
24: 19 IHdLen Ip header length in 4B dwords. 0 = 256B. 

1 8: 1 2 PHdLen Protoheader length expressed in 2B words. 0 = 256B | 
1 1 - 00 Tabid Specifies a prototype header. 

02 3 1 . 1 6 TcpSum Tcp-header partial-checksum. 

15 OO TcpWm Tcp-header window-size insertion-value. 

03 31 00 TcpSeq Tcp-header sequence insertion-value. 04 31.00 TcpAck Top-header acknowledge insertion-value. 05 
31.00 TcpEch Tcp-header time-echo insertion-value. Optional: included if TimEnb"!. 

06 31.00 TcpTim Tcp-header time-stamp Insertion-value. Optional: included if TimEnb==l. 

07 3L:00 Rsvd Don't care. 

The transmit process steps are: o CPU pops a XmtBuf off of the XmtBufQ. o CPU pops ^ "J ^^l' ^^^^gS 
CPU assembles a transmit descriptor in the PxyBuf. o GPU pushes a proxy command 'he PxyCmdQ^o PxySrcSq^^ 
pops the command off of the PxyCmdQ o PxySrcSqr fetches XmtCmd from PxyBuf^ o PxySrcSqr pushes XmtCmd onto 
the specified XmtCmdQ o PxySrcSqr pushes PxyBuf onto the PxyBufQ o XmtCmdSqr pops the XmtCmd off the 
XmtCmdQ. 

o XmtCmdSar passes XmtCmd to the XmtSrcSqr o XmtSrcSqr pushes XmtCmd onto XmtDmaQ o XnitSrcSqr, if 
rnSS fetSmt^^^^^^^ from Drm and pushes onto XmtDmaQ. o XmtSrcSqr. if indicated fetches transmit^^^^ 
frOTDrm and pushes onto XmtDmaQ. o XmtSrcSqr pushes ending status onto XmtDmaQ o XmtFmtSqr pops XmtCmd 
offThe XmtDmaQ and parses o XmtFmtSqr.if indicated, pops header off XmtDmaQ. formats ,t then pushes ,t onto the 
SSLS o XmTFmtSqr.if indicated, pops data off XmtDmaQ and pushes onto the XmtMacQ o XmtFmtSqr pushes ending 
status onto XmtMacQ. o XmtFmtSqr. if indicated, pushes XmtBuf onto XmtBufQ. 

CPU 

FIG. 34 is a block diagram of a CPU The CPU utilizes a vertically-encoded, superpipelined, multi -threaded 
m'^roarchitecture The pipelines stages are synonymous with execution phases and are assigned IDs PhsO through Phs7 
ShSlaSed vK^^^^^ are assigned IDs CpuldO through CpuldT. All CPUs execute simultaneously but 

lach occupies a^niSe^^^^^^^^ a given clock period The result is that a virtual CPU (thread) never has multiple 

fn^S'^l^mplSs oSft^^^ This arrangement allows m% utilization of the execution phases since it eliminates 
empty pipeline slots and pipeline flushing. 

The CPU includes a Wrrteable Control Store (WCS) capable of storing up to 8K instructions ^he instructions a^^^^^^ by 
the host through a mechanism described in the section Host Cpu Control Port Every virtural CPU (^'^^^]^^^^^^^ 
instructions fetched from the WCS. The WCS includes parity protection which will cause the CPU to halt to avoid data 
corruption. 

A CPU Control Port allows the host to control the CPU. The host can halt the CPUs and force execution at location zero 
Also, the host can write the WCS, check for parity errors and monitor the global cpu halt bit 
A 2048 word Register File provides simultaneous 2-port-read and 1 -port-write access^ The File J P^rt'^oned into 
comprising storage reserved for each of the 32 CPU contexts, each of the 8 CPUs and a global space J^e Register Fite^^ 
pS protected and thus requires initialization prior to usage. Reset disables parity detection enabling the CPU to initialize 
the File before enabling detection. Parity errors cause the CPU to halt. 

Hardware support for CPU contexts facilitates usage of context specific resources with ^^^^^^^^^''^^^ ^^^^ 
I File and Global Ram addresses are automatically formed based on the current <^°"text ^hanQing C^ 
' no saving nor restoration of registers and pointers Thirty-two contexts are implemented which allows CPU processing to 

continue while contexts sleep awaiting DMA completion 

CPU snooping is implemented to aid with microcode debug CPU PC and data are exported via a multilane serial Interface 
using an XG)« module. Refer to section XXXX and the SACI specification for additional infomnation See section Snoop 
Port for additional information. 

Local memory called Global Ram (GIbRam or GRm) is provided forimmediate access by me C^s^TJi^ c^all^m"^' i 
ported however, one port is inaccessible to the CPU and is reserved for use by the DMA D'^f (DMD). Globa Ram 
aSws each CPb cycle to perform a read or a write but not both Due to the delayed nature ^ ?P°^f^^^^^^^^ 

single instructions which perform both a read and a write, but instructions which attempt to read Global Ram ^mediately 
following an instruction which performs a write will result in a CPU trap. This memory is parity protected and requires ! 
initialization Reset disables parity detection. Parity errors cause the CPU to halt 

Queues are integrated into the CPU utilizing a dedicated memory called Queue Ram {QueRAM or QRm). Similar to the 
ilobafRam, Se memory is dual-ported but the CPU accesses only a single port DMD accesses the sea'nd P°^^^^^^^^^ 

i Egress and read egress queues containing data, commands and status. Care must ^^,^'3^0" "°^^^^^^^ fg"""^ 
an instruction immediately following an instruction reading any queue or a CPU trap will be performed. This memory is 

I parity protected and must be initialized See section Queues for additional information. 
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A Lock Manager provides several locks for which requests are queued and honored in the order in which they were 
received Locks can be requested or cleared through the use of flags or test conditions. Some flags are dedicated to 
locking specific functionrin order to utilize the Math Coprocessor a CPU must be granted a lock. The lock is monitored by 
the Coprocessor and must be set before commands wilt be accepted. This allows single instructions to request the lock, 
write the coprocessor registers and perform a conditional jump. Another lock is dedicated to ownership of the Slow Bus 
Controller. The remaining locks are available for user definition. See section Lock Manager for additional information. 
An Event Manager has been included which monitors events requiring attention and generates vectore to expedite CPU 
servicing. The Event Manager Is tightly integrated with the CPU and can monitor context state to mask context specific 
events. See section Event Manager for additional Infonnation 

Instruction Fonmat 

The CPU is verlically-mlcrocoded. That Is to say that the instmctlon is divided Into ten fields with the control fields 
containing encoded values which select operations to be performed. Instructions are fetched from a wntable-control-store 
and comprise the following fields. 

Instruction Fields 

Bits I Name Description 

95 :93 ] SqrCd Program Sequencer Code. 

92:92 I CCEnb Condition Code Enable. 

91 -.88 I AluOp ALU Operation Code. 

87:78 J SrcA ALU Source Operand A Select. 

77 :68 SrcB ALU Source Operand B Select . 

67 :58 DSt ALU Destination Operand Select. 

57:41 Adiiit Global RAM Address Literal. 

40:32 TStCd For Lpt,Rtt,Rbc and Jpt - Program Sequencer Test Code. FlgCd For Cnt.Jmp, Jsr and Jsx - Flag Operation 
Code. 

31 :16 LitHi For Lpt,Cnt,Rtt and Rbc - Literal Bits 31 :16. JrapAd For Jmp,Jpt,Jsr and Jsx - Program Jump Address. 
15:00 LltLo Literal Bits 15:00. 
Program Sequence Control (SqrCd). 

The SqrCd field In combination with DbgCtI determines the program sequence as defined in the following table. 

Se uencer Codes 

Condition Code Enable (CCEnb). 

The CCEnb field allows the SvdCC register to be updatied with the result of an ALU operation. 

Condition Code Enable 

Name CCEnb Description 

CCUpd 1' bO Condition code update 1 s di sabled . 

CCHId L" bl Condition code update 1 s enabled . 

Alu Operations (AluOp). 

The ALU performs 32-bit operations All operations utilize two source operands except for the priority encode operation 
which uses only one and the add cany operation which uses the "C" bit of the SvdCC register 

Alu Operands (SrcA, SrcB, Dst). 

All ALU operations require operands Source operand codes provide the ALU with data on which to operate and 
destination operand codes direct the placement of the ALU product Operand codes, names and descnptions are listed in 
the following tables lO ' bOOOOOOXXXX (0:15) - CPU Unique Registers. 

Each CPU uses its own unique instance of the followin re isters 

10 ' bOOOOOl OXXX (16:23) - Context Unique Registers. 
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Each of the thirtytwo CPU contexts has a unique instance of the following registers. Each CPU has a CpCxId register 
which selects a set of these registers to Ije read or modified when using the operand codes defined below. Multiple CPUs 
may select the same context register set but only a single CPU should modify a register to avoid conflicts 

1 0' bOOOOOIIOXX (24:28) - Aligned Registers. 

These operands provide an alternate method of accessing a subset of those registers that have been defined previously 
which contain a field less than 32-bits in length. The operands allow reading and writing these previously defined registers 
using the alignment which they would have during use in composite registers. 

Name |0pd[9: 0] IJDescription aCxCbf I (13'bO, CxCBId[06:00] , 12' be>, RM. Aligned CxCBId. aCxHbf I 25 {15'bO. 
CxHBId[00:00] , 16" be>, RM. Aligned CxHBld. aCxDxs 1 26 { 8'bO, CxDXId[02:00] , 21" be}, RW. Aligned CxDXId. 
aCpCfcx IL 22_{ 4'bO, CpCxld[04:00] , 24' b9>, RO. Aligned CpCxId. 

10' bOOOOOIllxx (28:31)- Composite Registers. 

These operands provide an altemate method of accessing a subset of those registers that have been defined previously 
which contain a field less than 32-bits in length. The operands allow reading and writing various composites of these 
previously defined registers. This has the effect of reading and merging or extracting and writing several registers with a 
single instmction. 

Name ||Opd[9: 0] Description 

CpsRgA Jl 28 (aCpCtx 1 aCxDxs ) , RO. 

CpsRgB 11 29 (aCpCbc I aCxDxs 1 aCxHbf ) , RO. 

CpsRgC II 30 (aCpCbc 1 aCxDxs 1 aCxCbf CxTcld) , RO. 

CpsRgD II 31 (aCpCbc I aCxDxs 1 NEQRr) , RO. 

10" bOOOOiOOOXX, 10' bOOOOIOOlOX (32:37) - Instruction Literals. 

These source operands facilitate various modes of access of the instruction literal fields. 

Name Opd[9 :0] Description 

LitSRO 32 {16"hOOOO, LitLo ) , RO. 

LitSRl 33 (16'hffff . LitLo } . RO. 

LitSLO 34 (LitLo, 16 hOOOO) , RO. 

LitSLI 35 (LitLo, 16 hffff), RO. 

LitLrg 36 (LitHi, LitLo >, RO. 

AdrUt 1 37 (1 5'hOOOO, AdLi tl , RO. 

1 0" bOOOOIOOllX (38:39) - Slow Bus Registers. 

These operands provide access to the Slow Bus Controller. See Slow Bus Subystem for a more detailed description. 

Name ||epd[9: 0] Description 

SlwDat ({SlwDat [31 : 00] >, wo. Slow Bus Data. 

SlwAdr II ^ 

II ^ |{SglSel[3: 0 ] , RegSel [27: GO] ), WO. Slow Bus Address. 
10' bOOOOIOIXXX (40:47) - Context and Event Control Registers. 

These operands facilitate control of CPU events and contexts. See the section Event Manager for a more detailed 

description. 

I Name HGpd [ 9 : OJUDescrlptlon 
I Ctxldl 40 CtxIdI [31 : 00] ), RAV Idling Context Flags 
CfcxSIp 41 CtxSlp[31 -00] ), R/W Sleeping Context Flags 
CtxBsy 42 Cb(Bsy [31-00] ), R/W Busy Context Flags 
I CbcSvr 43 1 (2VbO, Cbcld[04:00] ). RO Free Context Senrer 
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EvtDbl ||{20'b0, EvtBit (1 1:00] ), WO. Global Event Disable Bits. 
EvtEnb ||{20'bO. EvtBittll.OO] ), FW Global Even! Enable Bits 
45 :3'bO, Cbc[4:0], 20'bO, Vec [ 3 : 0] }, RO. Event Vector 
46 JL DmdE r r [ 31 : 00 ) ) , R/W Dmd DMA Context En- Flags 
Rsvd 47 iResen/ed 

1 0" bOOOOIIXXXX (48:63) - TCB Manager Registers. 

These operands facilitate control of TCB and Cache Buffer state. See the section TCB Manager for a more detailed 
description. 

10 • bOOOIOOOXXX (64:71) CPU Debug Registers. 

These operands facilitate control of CPUs See the section Debug Control for a more detailed description 

Name J9pd[9: 0] ([Description | 

CpuHIt II 64 ||W0. CPU Halt bits | 

CpuRun II 65 ||WO. CPU Run bits, j 

CpuStp II 66 ||WO. CPU Step bits. 1 

CpuDbg II 67 |W0. CPU Debug bits. 1 

TgrSet II 68 (Trigger Flag set bits. Bit per cpu plus one global. | 

TgrClr II 69 Hrrigger Flag cir bits. Bit per cpu plus one global. | 

DbgOpd II 70 ||WO. Debug Operands. | 

DbgDat 11 71 ||R/W. Debug Data. | 

10'b000100ieXX (72:75) -Math Coprocessor Registers. 

These operands facilitate control of the Math Coprocessor. See the section Math Coprocessor for a more detailed 

desori tion 
a memory 

Writing zeros its normal 

10 ' bOOOIOIXXXX (80:95) - Sequence Servers. 

There are eight incrementers which will provide a sequence to the CPU when read. They ^''^ .J^f ^f'^ J^^^^^^^^ 

sequences without the need for locking, modifying and unlocking. The servers are Paired such tha^ one ^^^f 

request sequence and the other functions as a service sequence. Refer to test conditions more infa Mema^^^^^^ 

sequence servers can be treated independantiy. A server (^n be ^^^^ fite primary or secon^ryad^^^^^^^^^ 

server with its secondary address causes the server to post increment. Wirting a server causes the server to initialize to 

zero. 

I Name 11 Opd[9:l] I [Description 

1 Seq8 1 1 0' bOOOIOIOXXX J|i24'bO. Spq8[07:00] ). RM, Inc = Opd [O] . 
1 Seql6 11 10- bOOOIOIlXXX J|{16'bO, Seq8[15:00] ), RM, Inc = Opd[0] . 
10 'bOOOHXXXXX (96:127) -Reserved. 

1 0 ' bOOIOXXXXXX .10' bOOIIOXXXXX (1 28:223) - Constants. Constants provide an alternative to using the instruction 

literal field 

10' bOOIIIOXXXX, 10' bOOIIIIOOXX, 10 ' bOOIIIlOlOX (224:244) - Reserved. 
10 ' bOOIIIIOIIX (245:246) - NIC Event Queue Servers. 
Bunch of verbage here. NEQId = CpQld[2:0]. 
{ Name | Opd[9;0] Description 

I EvtSq 1 0' bOOIlllOIIO (RlsSeq[NEQId] [1 5:0], WrtSeq[NEQld] (1 5 :0]}, RW. 
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EvtAd 10- bOOIIIIOIll (NEQId, WrtSeq [NEQld] [NEQSz: 00] }. RO, autoincrements if WrtSeq != RIsSeq; 

EvtAd 1 0- bOOIIIIOIll ( RlsSeq[1 5: 00] , 1 6"bO> WO. 

10 ■ bO 01 LIIIXXX . 10 • bOIOXXXXXXX (248:383) - Queue Registers. 

These operands facilitate control of the queues. See the section Queues for a more detailed description 
Name Opd[9:0] Description 
Rsvd 10' bOOIIIIIOOX Reserved. 

QSBfS 10- bOOIIIIIOlO R, Old =(2' bO, CpCxId) . SysBufQ status. 

QSBf D 1 0' bOOIIIIIOII RW, Qld = {2' bO, CpCxId) . SysBufQ data. 

QRspS R. Qld =(2' bl, CpCxId) . DmdRspQ status. 

QRspD RW, Qld = (2' bl, CpCxId) . DmdRspQ data. 

QCpuS J Qld = CpQId. Queue status-indirect. 

QCpuD 10' I K |RWZ, Qld = CpQId. Queue data-indirect. 

QlromS 10' bOiOIOXXXXX Qld ={1' bl,Opd[4:0] ) Queue status-direct. 

QlmrnD 10' boionxxxxx KZ Qld ={^- bl,Opd[4:0] ) Queue data-direct. 

10 ' bOIIXXXXXXX (384:51 1) - Global Ram Operands. 

These operands provide multiple methods to address Global Ram. The last three operands support 3"*°™"° Pf?*- 
incrementing. The increment is controlled by the operand select bit Opd[3] and takes place after the address has been 
compiled. All operands utilize bits [2:0] to control byte swapping and size as show/n below. 
Opd (2:2) Transpose: 0-NoSwap, l-Swap Opd[l:0) DataSize: C-4B, 1-3B, 2-2B, 3-1 B 

Hardware detects conditions where reading or writing of data crosses a word boundary and cause the program counter to 
told wtth the Sp vector. The following shows how address and Opd[2:0] affect the Global Ram data presented to the 
ALU. 

Transpose ByteOffeet GRmData 4B 3B 2B IB 
0 0 abed abed Obcd OOcd OOOd 
0 1 abcX trap Oabc OObc OOOc 
0 2 abXX trap trap OOab 000b 

0 3 aXXX trap trap trap 000a 

1 0 abed dcba Odcb OOdc OOOd 
1 1 abcX trap Ocba OOcb OOOc 

1 2 abXX trap trap OOba OOOb 
1 3 aXXX trap trap trap OOOa 

The following shows how address and Opd[2:0] affect the ALU data presented to the Global Ram. 
Transpose DataSize AluOut OF=0 OF=l 0F=2 0F=3 
0 4B abed abed trap trap trap 
0 3B Xbcd -bed bed- trap trap 
0 2B XXcd — cd -cd- Cd~ trap 

0 IBXXXdd— d--d— d 

1 4B abed dcba trap trap trap 
1 3B Xbcd -deb deb- trap trap 
1 2B XXcd ~de -de- dc~ trap 
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1 1B XXXd— d-d-d-d 

Global Ram Operands - continued. 

Name Opd[9.0] Description 

Rsvd 10 'bOIIOOXXXXX Reserved. 

GTRQWr 10 'bOHOIOOXXX GCchBf + CxCCtllTRQWrSq]; WO, Write to TCB receive queue. 
GTRQRd 10 'bOHOIOOXXX GCchBf + CxCCtl[TRQRdSq]. RO, Read from TCB receive queue. 
GTBMap 1 0 'bOHOIOIXXX TCB Bit Map. GRmfTIVIapBs + (CKTcld»5) ] ; 
GLItAd 10 "bOHOIIOXXX Global Ram. GRm [AdLit] ; 

GCchBf 10 "bOIIOlllXXX Cache Buffer. GRm[CchBBs + (CxCBId * CchBSz) +" AdLIt); 
GDmaBf 10 'bOIIIOOOXXX DMA descriptor. GRir»[DmaBBs + (CpCxId, CxDXId, 4' bS} + AdLIt]; 
GHdrBf 10 'bOIIIOOIXXX Header Buffer. GRm[HdrBBs + ( {CpCxId, CxHBId} * HdrBSz) + AdLIt], 
GHdrix 1 0 'bOlliOIXXXX Header Buffer Indexed. If Opd{3] CpHblx ++; GRmfGHdrBf + CpHblx], 
GCtKAd 10 ■bOIIIIOXXXX Global Ram cbc address. If Opd[3] CxGRAd ++; GRmfCxGRAd + AdLIt]; 
GCpuAd 10 •bOIIIIIXXXX Global Ram cpu address. If Opd[3] CpGRAd ++; GRm[CpGRAd + AdLIt]; 
10 ■ blXXXXXXXXXX (512:1023) - Register File Operands. 

These operands provide multiple methods to address the Register File The Register File has three partitions comprising 
CPU space. Context space and Shared space 

Name Opd[9:0] Description 

FCxWIn 1 0 'blOOOXXXXXX Context File Window. RFI[CxFIBs + (CpCxld*CxFISz) + OpdX[5:0]]; 
FCpWin 10 "blOOIXXXXXX CPU File Window. RFI[CpFIBs + (CpId * CpFISz) + OpdX[S:0]]; 
FCxFAd 1 0 'blOiOXXXXXX Context File Address . RFI [CxFIAd + OpdX [5:0]]; 
FCpFAd 10 "blONXXXXXX CPU File Address. RFI[CpFIAd + OpdX [5:0]]; 
FShWIn 10 "bllXXXXXXXX Shared File \ "Jindow. RFI[ShFIBs + OpdX [5:0]]; 
Global Ram Address Literal (AdLIt). 

This field supplies a literal which Is used in fomiing an address for accessing Global Ram. 
Name [(Description 
AdX.it i|AdL-,t[ 16-0] 
Test Operations (TstCd). 

Instruction bits [40:32] serve as the FIgCd and TstCd fields. They serve as the TstCd for Lpt, Rtt, Rb< and Jpt Instructions 
TstCd[8] forces an inversion of the selected test result. Test codes are defined In the following table. 

Test Codes 

Name TstCd[7:0] Description 

True 0 Always true, 

CurC32 1 Current alu carry. 

CurV32 2 Current alu overflow. 

CurN32 3 Current alu negative. 

CurZ32 4 Current 32b zero. 

CurZ64 5 Current 64b zero. {CurZ32 & SvdZ32) ; 

CurULE 6 Cun-ent unsigned less than or equal . (CurZ32 |~CurC32); 
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CurSLT 7 Current signed less than. (CurN32 " CurV32) : 

CurSLE 8 Current signed less than or equal. (CurN32 CurV32) I CurZ32; 
SvdC32 9 Saved alu carry. 
SvdV32 10 Saved alu overflow. 
SvdN32 1 1 Saved alu negative. 
SvdZ32| 12 Saved 32b zero. 

SvdULE 1 3 Saved unsigned less than or equal. (SvdZ32 I ~SvdC32) ; 

SvdSLTI 14 Saved signed less than. {SvdN32 -- SvdV32) ; 

SVdSLEj 15 Saved signed less than or equal. (SvdN32 SvdV32) I SvdZ32: 

SeqTstI bOOOIOOXX Sequence Server test. TslCd[l:0] selects one of 4 pairs. 

MthErr 20 Math Coprocessor error. Divide by 0 or multiply overflow. 

MthBsy 21 Math Coprocessor busy. 

NEQRdy 22 NEQRIsSq[NEQId] '= NEQWrtSeq [NEQId] . 
I Rsvd 22:159 Reserved. 

AluBit 8' blOIXXXXX Test alu data bit. AluDt tTstOp[4:0] ] ; 

LklTst 8' bllOOXXXX Test immediate lock. Lockl |T3t0p[4 :0] ] ; 

LklReq 8' bllOIXXXX Request and test immediate lock. Lockl [Tsiep[4 :0] ] ; 

LkQTat 8' blllOXXXX Test queued lock. LockQ[TstOp [4 :0] ] ; 

LkQReq 8' bllllXXXX Request and test queued lock. LOCkQ[TStOp [4 :0] ] ; 
! Flag Operations (FIgCd). 

i Instruction bits[40:32] serve as the FIgCd and TstCd fields. They serve as the FIgCd for Cnt, Jmp, Jsr and Jsx 
instructions. Flag codes are defined in the following table. 

Flag Codes 

Name FlgCd[8:0] Description 
Rsvd 0- 127 Reserved . 
LdPc 128 Reserved. 
Rsvd 129: 191 Reserved. 

LklClr 9 'bOHOOXXXX Clear immediate lock. See section Lock Manager . 
LklReq 9 'bOHOiXXXX Request immediate lock. See section Lock Manager. 
LkQCIr 9 ' bOIIIOXXXX Clear queued lock. See section Lock Manager . 
LkQReq 9 'bOIIIIXXXX Request queued lock. See section Lock Manager. 

Rsvd 256:511 Reserved. 
Jump Address (JmpAd). 

Instruction bits [31 : 16] serve as the JmpAd and LitHi fields. They seive as the JmpAd for Jmp, Jpt, Jsr and Jsx 

instructions. 

Name [{Description 

JmpAd pmpAd [ 15 : 001 

Literal High (LitHi). 

Instruction bits [31: 16] serve as the JmpAd and LitHi fields. They serve as the LitHi for Lpt , Cnt . Rtt and Rtx instructions 
LitHi can be used with LitLo to for a 32-bit literal. 
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I Name Ipescription 
I LitHi |[Li tHi 1 15 : 00 ] ; 
Literal Low (LitLo). 

Instruction bits [15:00] serve as the LitLo field. 
Name [[Description 
LitLol|LitLo[15:00]; 
CPU Control Port 

The host requires a means to halt the CPU, download microcode and force execution at location zero That means is 
provided by the CPU Control Port. The port also allows the host to monitor CPU status. 

SAC! Port. 

FIG 35 shows a Snoop Access and Control Interface (SACI) Port that facilitates the exporting of snooped data from the 
CPU to an external device for storage and analysis This is intended to function as an aid for the debugging of microcode A 
snoop module monitors CPU signals as shown in FIG. 35 then presents the signals to the XGXS module for export to an 
external adaptor. The "Msc" signal group includes the signals ExeEnb, CpuTgT, GIbTgr and a reserved signal. A table can 
specify the snoop data arid the order in which it is exported for the four possible configurations 

DEBUG (Dbg) 

Describe function of debug registers here 

Halt, run, stop, debug, trigger, debug operand and debug data. 

Debug Operand allows the selection of the AluSrcB and AluDst operands for CPU debug cycles. 

Debug Source Data is written to by the debug master Can be specified in AluSrcB field of DbgOpd to force writing of data 
to destination specified in AluDst This mechanism can be used to push on to the stack or PC or 

CPU specific registers which are otherwise not accessable to the debug master. 

Debug Destination Data is wirtten to by the debug slave Is specified in AluDst field of DbgOpd to force saving of data 
specified in the AluSrcB field This allows reading from the stack or PC or CPU specific register which are othenwise not 
accessable to the debug master 

LOCK MANAGER (LckMgr/LMg) 

A Register Transfer Language (RTL) description of a lock manager is shown below, and a block diagram of the lock 
manager is shown in FIG. 36 

reg RstLQ; wire LclRstL = ScanMd ? RstL : RstLQ; always @ (posedge Clk or negedge RstL) lf(!RstL) RstLQ <= 0; else 
RstLQ<= ISoftRst; 

//Cpu Id. 

//Cpuld is used to detemiine which cpu's lock or sen/ice requests to service, 
always @ (posedge Clk or negedge LclRstL) begin if ("LclRstL) 
Cpuld <= 0; else 
Cpuld <= Cpuld + 1; end 



//Cpu Lock Requests 

//CpLckReq re-circulates . It is serviced at phase 0 only, where it may be set or cleared. 

/.«.*.*.««.«*.*««***.*««*******"«******«"***«^^ «****«.* ********* / always @ (posedge Clk or 

negedge LclRstL) begin if ('LclRstL) for (iPhs=0; iPhs<"qCpus ; iPhs=iPhs-i-l ) CpLckReq [iPhs] <= 0; else for (iLck=0; 
iLck<"qLocks ; iLck=iLck+l ) begin if (LckCyc S (Lckid— iLck) ) 

CpLckReq[0] [ILck] <= LckSet; else 

CpLckReq[0] [iLck] <= CpLckReq [ 'qCpus -1] [iLck] ; for UPhS=l; iPhs< qCpus ; iPhs=iPhs+l) 
CpLckReq[iPhs] ( ILck] <= CpLckReq[iPhs-l ] [ ILck] ; end end 
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//Cpu Service Pending 

//CpSvcPnd is set or cleared in pinase 1 only. CpSvcPnd is always forced set if 
CpLckReq is set. 

//CpSvcPnd remains set until CpLckReq is reset and the output of CpSvcQue indicates ttiat the current 
//cpu is being serviced. 

always @ (posedge Clk or negedge LclRstL) begin if CLclRstL) for 

(iPhs=0; iPhs< qCpus ; iPhs=iPhs+l) CpSvcPnd ( iPhs] <= 0; 

e Ise begin 

CpSvcPndfO] <- CpSvcPnd[ "qCpus -11; for (iLck=0; iLck< qLocks; iLok=iLck+l) begin if CCpLckReqCO) [ILck] & CpSvcVId 
[0] [ ILck] & <Cpuld==CpSvcQue [0] [ ILck] ) ) CpSvcPnd(l) (ILck] <= I'bO; else 

CpSvcPnd(l) DLck] <= CpSvcPnd[0] [ILck] | CpLckReq[0] [ ILck] ; for (iPhs=2: iPhs< qCpus : iPhs=iPhs+l) 
CpSvcPnd[iPhs| [ILck] <= CpSvcPnd [ iPhs-1 ] [ iLck] ; end end end 
//Service Queues 

//CpSvcQue is modified at phase 1 only. There is a CpSvcQue per lock. The output of CpSvcQue 
//indicates which cpu req/rls is to be serviced. When the corresponding cpu is at phase 1 it's 
//CpLckReq as examined and if reset will cause a shift out cycle for the CpSvcQue. 
If the current 

//cpu is different from the CpSvcQue output and a Cpuld has not yet been entered in to the CpSvcQue 

//as indicated by CpSvcPnd, then the cun^ent Cpuld will be written to the 

CpSvcQue 

always @ (posedge Clk or negedge LclRstL) if (ILclRstL) for (iLck=0; iLck< qLocks; iLck=iLck+l) begin for (iEntry=0, 
IEntry<^LockRqrs, iEntry=iEntry+l ) begin CpSvcQue [iEntry] [ ILck] <= 0; CpSvcVld[iEntry] [ILck] <= 0; end end else or 
(iLck=0; iLck< qLocks, iLck=iLck+l) begin if ( 'CpLckReqtO] [ILckl ) begin if (CpSVcVld[0] [ILck] & <& (CpSvcQue(O) [ILck] 

Cpuld))) begin for (iEntry=0; iEntty< ( qLockRqrs-1) ; iEntr/=iEntry+l) begin CpSvcQue [ Entry] [ILck] <= CpSvcQue 
(iEntry+1 ) [ILck] CpSvcVldDEntry] [ILck] <= CpSvcVId [iEntry4l] [ILck] ; end for(iEntry= ( "qLockRqrs-1 ) : iEntry< 
qLockRqrs iEnfry=iEntry+l) begin CpSvcQue [Entry] [ILck] <= C. CpSvcVldfiEntry] (ILck] <= 0, end end end etee begin if 
( 'CpSvcPnd[0] [ILck] ) begin for (iEntry=0, iEntry<l, iEntry=iEntry+l) begin if UCpSvcVldUEntry] [ILck] ) begin CpSvcQue 
[Entry] [ILck] <= Cpuld; CpSvcVidUEntry) [ILck] <= I'bl; end Gnd 

for (iEntry=l; iEntry< 'qLockRqrs; iEntry=iEntry+l ) begin if ( !CpSvcVld[iEntry] [iLek] S CpSvcVId (lEntry-l] [iLek] ) begin 
CpSvcQue (iEntry] [iLck] <= Cpuld; CpSvcVId [Entry] [iLek] <= I'bl; end end end end end 

//Lock Grants 

//LckGnt is set or cleared in phase 1 only. LckGnt is set if CpSvcQue indicates that the current cpu is 

// being service and CpLckReq is set 

always @ (pospdge Clk or negedge LclRstL) begin if (ILclRstL) 

LckGnt <= 0; else for (iLck=0; iLck<"qLocks; iLck=iLck+l) begin if ( CpLckReq[0] [iLek] S CpSvcVId [ 0 ) [ILok] S 
(CpSvcQue io) [ ILek] ==Cpi;id) ) LckGnt [ILek) <= I'bl; else if ( CpLckReq[0 ] [ILek] & 'CpSvcVId [0] [ iLek] ) LckGnt [iLek] 

LckGnt [iLek] <= I'bO; end end 
//My Lock; Test 

//MyLck is sen/iced m phase 3 only. 

always @ (posedge Clk or negedge LclRstL) begin if (ILclRStL) 
MyLck <= 0; else 

MyLck <= LckGnt [TstSel] ; end endmoduLe 
SLOWBUS CONTROLLER 
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The slow bus controller comprises a Slow Data Register (SlwDat), a Slow Address Register (SlwAdr) and a Slow Decode 
Register (SlwDec) SlwDat sources data for a 32-blt data bus which connects to registers within each of Sahara's functional 
modules The SlwDec decodes the SlwAdr[SglSel] bits asserts CfgLd signals which are subsequently synchronized by 
their target modules then used to enable loading of the target register selected by the SlwDec[RegSel] bits. 

Multiple cycles are required for setup of SlwDat to the destination registers because the SlwDat bus is heavily loaded. 
Because of this, only a single CPU can access slow registers at a given time. This access is moderated 

by a queued loclc. Queued lock xx must be acquired before a cpu can successfully write to slow registers. Failure to obtain 
the lock will cause the -write to be ignored The eight level pipeline architecture of the CPU ensures that a single CPU will 
allow eight clock cycles of setup and hold for slow data. A minimum of three destination clock cycles are needed to ensure 
that data is captured. This means that if the destination to CPU clock frequency ratio is less than .375 
(SqrClkFrq/CpuClkFrq) then a delay must be inserted between steps 2 and 3, and steps 4 and 5. The CPU ucode should 
perform the steps shown in FtG. 37. 

Insert module select and register select and register definition tables here. 
Dispatch Queue Base (CmdQBs) 

GIbRam address at which the first Dispatch Queue resides. Used by the CPU while writing to a dispatch queue and by 
DMA Dispatcher while reading for a dispatch queue 

Bits Name Description 

31 . 17 Rsvd Ignored. 

1 6. 00 CmdQBs Start of GRm based Dispatch Queues . Bits [ 9 : 0 ] a re a Iways zeroes 
Response Queue Base (EvtQBs) 

GIbRam address at which the first Response Queue resides. Used by the CPU while reading from a response queue and 
by DMA Response Sequencer while wrrting to a response queue 

Bits Name Description 

31 -17 Rsvd Ignored . 

1 6 : 00 EvtQBs start of GRm based Response Queues . Bits [ 9 : 0 ] are always zeroes . 
DMA Descriptor Base (DmaBBs) 

GIbRam address at which the first DMA Descriptor resides Used by the CPU, DMA Dispatcher and DMA Response 
Sequencer. 

Bits Name Description 

31.17 Rsvd Ignored. 

16.00 DmaBBs Start of GRm based DMA Descriptors. Bits[9 0| are always zeroes. 
Header Buffer Base (HdrBBs) 

GIbRam address at which the first Header Buffer resides. Used by the CPU and DMA Dispatcher. 
Bits Name Descript ion 
31-17 Rsvd Ignored . 

i 1 6 - 00 HdrBBs Start of GRm based Header Buf fers . Bit s ( 9 : 0 ) are always zeroes . 
^ Header Buffer Size (HdrBSz) 

Size of the Header Buffers. Used by the CPU and DMA Dispatcher to detennine the GIbRam location of successive 

Header Buffers. An entry of 0 indicates a size of 256 

Bits Name Description 
31 :08 Rsvd Ignored 

07:00 HdrBSz Size of GRm based Header Buffers. Bits (4 0) are always zeroes. 
TCB Map Base (TMapBs) 

GIbRam address at which the TCB Bit Map resides. Used by the CPU. 
I Bits Name Description 
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31:17 Rsvd Ignored. 

1 6:00 TMapBs Start of GRm based TCB Bit Map. Bits[9:0) are always zeroes. 
Cache Buffer Base (CchBBs) 

GIbRam address at which the first Cache Buffer resides Used by the CPU and DMA Dispatcher 
Bits Name Description 
31:17 Rsvd Ignored. 

16:00 CchBBs Start of GRm based Cache Buffers. Bits[9:0) are always zeroes. 
Cache Buffer Size (CchBSz) 

Size of the Cache Buffers. Used by the CPU and DMA Dispatcher to determine the GIbRam location of successive Cache 
Buffers and by the DMA Dispatcher to detemnine the amount of data to copy from dram TCB Buffers to Cache Buffers. An 
entry of 0 indicates a size of 2KB. 

Bits Name Description 

31:11 Rsvd Ignored. 

10:00 CchBSz Size of GRm based Cache Buffers. Bits[6:0) are always zeroes. 
Host Receive SQL Pointer Index (SglPlx) 

Location of SGL Pointers relative to the start of a Cache Buffer. Used by the DMA Dispatcher to fetch the RcvSgIRr or 
XmtSglPtr during SGL mode operation. 

Bits Name Description 

31:09 Rsvd Ignored. 

08:00 Sgl Plx Offset of RcvSqIPtr. Bits[3:0) are always zeroes. 
Memory Descriptor Index (MemDsclx) 

Location of the Next Receive Memory Descriptor relative to the start of a Cache Buffer. Used by the DMA Dispatcher to 
specify a data destination address during SGL mode operation. 

Bits Name Description 

31:09 Rsvd Ignored. 

08:00 MemDsclx Offset of RcvMemDsc. Bits [3:0] are always zeroes. 
Receive Queue Index (TRQIx) 

Start of the Receive Queue relative to the start of a Cache Buffer. Used by the DMA Dispatcher for TCB mode operations 
to specify the amount of data to be copied. Used by the CPU to formulate Receive Queue read and write addresses. 

Bits Name Descript ion 

31 : 09 Rsvd Ignored . 

08 : 00 TRQIx Of feet of TcbRcvLis . Bits [ 3 0 ) are always zeroes . 
Receive Queue Size (TRQSz) 

Size of the Receive Queue. Used by the DMA Dispatcher for TCB mode operations to specify the amount of data to be 
copied. Used by the CPU to detemiine roll-over boundaries for Receive Queue Write Sequence and Receive Queue Read 
Sequence. An entry of 0 indicates a size of 1KB. 

Bits Name Description 

31:09 Rsvd Ignored. 

08:00 TRQSz Size of TEQ. Bits(3:0] are always zeroes. 
TCB Buffer Base (TcbBBs) 

Host address at which the first TCB resides. Used by the DMA Dispatcher to fomiulate host addresses during 
TCB mode operations. 
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Bits Name Description 

63:00 TcbBBs Start of Host based TCBs. Bits( 1 0:0] are always zeroes. 
Dram Queue Base (DrmQBs) 

Dram address at wtilch the first dram queue resides. Used by the Queue IWanager to formulate dram addresses during 
queue body read and write operations. 

Bits Name Description 

31:28 Rsvd Ignored. 

27:00 DrmQBs Start of dram based queues. Bits[17:0) are always zeroes. 
MATH COPROCESSOR (MCp) 

Sahara contains hardware to execute divide/multiply operations. There is only I set of hardware so only one processor 
may be using it at any one time. 

The divider is used by requesting QLck[0] while writing to the dividend register. If the lock is not granted then the write will 
be inhibited, permitting a single Instruction loop until the lock is granted. The operation is then initiated by writing to the 
divisor register which will cause test condition MthBsy to assert. When complete, MthBsy status will be reset and the result 
can be read from the quotient and dividend register. 

Divide is executed sequentially 2 bits at a time. The number of clocks taken is actually deterministic, assuming the sizes of 
the operands are known. For divide, the number of cycles taken can be calculated as follows: 

MSJBit divend = bit position of most significant 1 bit in dividend 

MS_Bit_divisor = bit position of most significant 1 bit in divisor 

Number of clocks to complete = MS_Bit_divend/2 - MS_Bit_divisor/2 +* 2 

So if, for instance, we know that the dividend is less than 64K (fits in bits 15-0) and the divisor may be as small as 2 
(represented by bit 1), then the maximum number of clocks to complete is 15/2 -1/2 + 2 = 7- 0 + 2 = 9 cycles 

The multiply is perfonned by requesting QLck[0] while writing to the multiplicand register. If the lock is not granted then the 
write will be inhibited, permitting a single instruction loop until the lock is granted. The operation is then initiated by writing 
to the multiplier register which will cause test condition MthBsy to assert. When complete, MthBsy status will be reset and 
the result can be read from the product register. 

Multiply time is only dependent on the size of the multiplier. The number of cycles taken for multiply may be calculated by 

MS_Bit_multiplier = bit position of most significant 1 bit in multiplier Number of clocks to complete = MS_Bit_multiplier 12 + 
1 

So to multiply by a 16 bit number would take (15/2 + 1 ) or 8 clocks 

Queues 

The Queues are utilized by the CPU for communication with modules or between processes. There is a dedicated Queue 
Ram which holds the queue data. The queues can be directly accessed by the CPU without need for issuing commands 
That is to say that the CPU can read or write a queue with data The instruction which performs the read or wirte must 
perform a test to determine if the read or write was successful 

There are three types of queues Ingress queues hold information which is passing from a functional module to the CPU. 
FIG. 38 shows an Ingress Queue. Egress queues hold information which is passing from the CPU to a functional module 
FIG 39 shows an Egress Queue. Local queues hold information which is passing between processes that are running on 
the CPU 

EVENT MANAGER (EvtMgr/EMg) 

Events and CPU Contexts are inextricably bound DMA response and run events invoke specific CPU Contexts while all 
other events demand the allocation of a free CPU Context for servicing to proceed. FIG. 40 shows an Event Manager The 
Event Manager combines CPU Context management with event management in order to 

reduce idle loop processing to a minimum. EvtMgr implements context control registers which allow the CPU to force 
context state transitions. Current context state can be tested or forced to idle, busy or sleep. Free context allocation is also 
made possible through the CbcSvr register which provides single cycle servicing of requests without a need for spin-locks. 

Event control registers provide the CPU a method to enable or disable events and to service events by providing vector 
generation with automated context allocation. Events serviced, listed in order of priority, are: 

Q En-Evt - DMA Error Event . 
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Q RspEvt - DMA Completion Event, a BRQEvt - System Buffer Request Event a RunEvt - Run Request Event. 

□ DbgEvt - Debug Event. 

o FWSEvt - Finnware Event 3. 

Q HstEvt - Slave Write Event. 

Q TmrEvt - Interval Timer Event. 

D FW2Evt - Finnware Event 2. 

D FSMEvt - Finite State Machine Event. 

Q RSqEvt - RcvSqr Event 

o FWIEvt - Firmware Event 1 . 

o CmdEvt - Command Ready Event 

D Lnl(Evt - Link Change Event . 

Q FWOEvt - Firmware Event 0. 

Q ParEvt - ECC Error Event 

EvtMgr prioritizes events and presents a context to the CPU along with a vector to be used for code branching. Event 
vectoring Is accomplished when the CPU reads the Event Vector (EvtVec) register which contains an event vector in bits 
[3-0] and a CPU Context In bits [28:24]. The instruction adds the retrieved vector to a vector-table base-address constant, 
loading the resulting value into the program counter, thereby accomplishing a branch-relative function. The instruction 
actually utilizes the CpCxId destination operand along with a flag modifier which specifies the pc as a secondary 
destination. The actual instruction would appear something lilce: ftdd EvtVec VTblAdr CpCxId, FIgLdPc ; //Vector into 
event table . 

EvtVec Is an EvtMgr register, VTblAdr is the instruction address where the vector table begins, CpCxId is cun-ent CPU's 
context ID register and FIgLdPc specifies that the alu results also be written to the program counter. The final effect is for 
the CPU Context to be switched and the event to be decoded within a single cycle. A single exception exists for the 
RunEvt for which the EvtVec register does not provide the needed context for resumes. 

Readinq the EvtVec register causes the event type associated with the current event vector to be disabled by clearing it's 
corresponding bit in the Event Enable register (EvtEnb) or in the case of a RspEvt, by setting the context to the busy state. 
The effect is to inhibit duplicate event service, until explicitly enabled at a later time. The event type may be re-enabled by 
writing ifs bit position in the EvtEnb register or CtxSIp register. The vector table takes the following form. 

Vec Event Instruction 

0 RspEvt Mov DmdRspQ CpRgXX Rtx;r //Save DMA response and re-enter. 

1 3RQEvt Jmp BRQEvlSvc; // 

2 RunEvt Mov CbcRunQ CpCxId Jinp RunEvtSvc; //Save run event descriptor. 

3 DbgEvt Jmp DbgEvtSvc; // 
A FW3Evt Jmp FW3EvtSvc; // 

5 HstEvt Mov HstEvtQ CpRgXX Jmp HstEvtSvc; //Save lower 32 bits of descriptor 

6 TmrEvt Jmp TmrEvtSvc; // 

1 FW2Evt Jmp FW2EvtSvc; // 

8 FSMEvt Mov FSMEvtQ CpRgXX Jmp FSMEvtSvc; //Save FSM event descriptor. 

9 RSqEvt Mov RSqEvtQ CpRgXX Jmp RspEvtSvc; //Save event descriptor. 
A FWIEvt Jmp FWlEvtSvc; // 

B CmdEvt Mov HstCmdQ CpRgXX Jmp CmdRdySvc; //Save command descriptor. 
CLnkEvtJmpLnkEvtSvc;// 
D FWOEvt Jmp FWOevtSvc; // 
E ParEvt Jmp ParEvtSvc; // 
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F NulEvt Jmp ldleLoop;7/No event detected. 

EvtMgr provides an event mask for each of the CPUs. This allov»/s ucode to configure each CPU with a unique rnasl< for 
the purpose of distributing the load for servicing events Also, a single CPU can be defined to service utility functions. 
RSqEvt and CmdEvt priorities can be shared. Each time the EvtMgr issues RSqEvt or CmdEvt in response to f" EvtVec- 
read the issued event is assigned the least of the two pTroTrtles while the other event Is assigned the greater of the two 
pTTOTTtes thus ensuTTng fairness This is accomplished by setting PnTgl each time RSqEvt is issued. 

Idle Contexts Register (CWdl) 
Bit Description 

31'00 R/W - CtxIdI [31 :00] Set by writing "1". Cleared by writing CbcBsy or CbcSlp. 
Busy Contexts Register (CtxBsy) 
Bit Description 

31 :00 R/W - CtxBsy [31 :00] . Set by writing "1" Cleared by writing Cbddl or CbcSlp. 

Sleep Contexts Register (CtxSIp) 
Bit Description 

31 :00 R/W - CtxSlp(31 :00] . Set by writing "1". Cleared by writing CtxBsy or Cbtldl. 
CPU Event Mask Register (CpuMskJCurCpu]) 

Bit Description 
31:10 Reserved. 

OE: OE R/W bit - CpuMsk[ll]. Writing a "1 " enables ParEvt Writing a "0" disables ParEvt . 

OD:OD R/W bit - CpuMsk(ll). Writing a "1" enables BRQEvt Writing a ^^O" disables BRQEvt 

OC-OC R/W bit - CpuMskllOJ. Writing a "1" enables LnkEvt Writing a "0" disables LnkEvt . 

OB: OB R/W bit - CpuMsk[09]. Writing a "1" enables CmdEvt. Writing a "0" disables CmdEvt. 

OA: OA RM bit - CpuMsk(ll). Writing a "1" enables FW3Evt . Writing a "0" disables FWSEvt. 

09:09 R/W bit - CpuMsk{08). Writing a "1" enables RSqEvt. Writing a "0" disables RSqEvt. 
-■ 08:08 R/W bit - CpuMsk[07]. Writing a "1 " enables FSMEvt. Writing a "0" disables FSMEvt. 

07:07 R/W bit - CpuMsk{ll). Writing a "1" enables FW2Evt. Writing a "0" disables FW2Evt . 

06:06 R/W bit - CpuMsk[06]. Writing a "1" enables TmrEvt Writing a "0" disables TmrEvt . 

05:05 R/W bit - CpuMsk[05]. Writing a "1" enables HstEvt. Writing a "0" disables HstEvt. 

04 :01 R/W bit - CpuMsk(ll). Writing a "1 " enables FWIEvt. Writing a "0" disables FWIEvt. 

03:03 R/W bit - CpuMsk|02). Writing a "1" enables DbgEvt. Writing a "0" disables DbgEvt . 
I 02:02 R/W bit - CpuMsklOl). Writing a "1 " enables RunEvt. Writing a "0" disables RunEvt . 

01-01 RM bit - CpuMsklll]. Writing a "1" enables FWOEvt. Writing a "0" disables FVJOEvt . 

00:00 R/W bit - CpuMsk(00|. Writing a "1" enables RspEvt. Writing a "0" disables RspEvt . 
; TCB MANAGER (TcbMgr/TMg) 

i FIG 41 is a Block Diagram of a TCB Manager. Sahara is capable of offloading up to 4096 TCBs. TCBs, which reside in 
I external memory, are copied into Cache Buffers (Cbfs) for a CPU to access them. Cach^ Buffers are impM^^^ in 
; contiguous locations of Global Ram to which the CPU has ready access. A maximum of 1 28 Cache Buffers can be 
! implemented in this embodiment, but Sahara will support fewer Cache Buffers for situations which need to conserve 
, Global Ram. 

• Due to Sahara's multi-CPU and multi-context architecture, TCB and Cache Buffer access is coordinated through the use 
I of TCB Locks and Cache Buffer Locks. TcbMgr provides the services needed to facilitate these locks. 
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' TcbMgr is commanded via a register which has sixteen aliases whereby each alias represents a ^^°"L™Htii«^« ' 
Command parameters are provided by the alu output during CPU instructions which specify one of the command aliases 
as the destination operand. Command responses are immediately saved to the CPU s . accumulator. 
FIG 41 illustrates the TCB Manager's storage elements. A TCB Lock register is provided for each of the thirty-two CPU 
Contexts and a Cache Buffer Control register is provided for each of the 1 28 possible Cache Buffers. 

TCB Locks 

The objective of TCB locking Is to allow logical CPUs, while executing a Conte^ specific thready ^ rec^^owners^ | 
TCB for the purpose of reading the TCB. modifying the TCB or copying the TCB o or from the host l is also the ob ert^e 
of TCB locklng!to enqueue requests such that they are granted in the order received with respect to like pnonty requests 
' and such that high priority requests are granted prior to normal priority requests. 

Up to 4096 TCBs are supported and a maximum of 32 TCBs, one per CPU Context, can be If^li^d ^1^1' 
ownership is granted to a CPU Context and each CPU Context can own no more than a single TCB. TCB ownership is 
reauested when a CPU writes a Cpxid and Tcbid to the TIkNmlReq or TIkRcvReq operand. Lock ownership wiN be granted 
. Zedattirpro^^^^^ 

Chnlnh option is not selected then the TCB Lock request will be chained. Chained TCB Lock requests will be granted at 
fu!jre times as TCB Lock release operations pass TCB ownership from CPU Context which initiated the next lock request. 

Priority sub-chaining is affected by the TIkRcvReq and TIkRcvPop operands. This ^3^!"^'%^ 1°;!;' 'f/fj'^y;^^^ 

copy t^om the RcvSqr event queue to the TCB receive-list and the ensuing release of the CPU- context for re-use. This 

featoe increases the availablity of CPU Contexts for perfomiing work by allowing them to be placed back into the free 

i context pool ThI dequeuing of high priority requests from the request chain does not affect the current lock ownership. It 
SfthTcur^Snt CRJ to Change to the CPU Context which generated the high priori^ request, then copv he receive- 

I event descriptor from a context specific register to a CPU specific register, switch back to the previous CPU-Mxt. 

I release the dequeued CPU Context for re-use and finally push the retrieved receive-event descnptor on to the TCB 
receive-list. 

I Each CPU Context has a dedicated TCB-Lock register set of which the purpose is to describe a lock request. The TCB 

' Lock register set is defined as follows. 

TIkReqvid - Request Valid indicates that an active lock request exists and serves as a ^a^d 'ndii^tionfor^l other 
registers. This register is set by the TIkNmlReq and TIkRcvReq commands and it is cleared by the TlkLckRls and 
TIkRcvPop commands. 

TIkTcbNura - TCB Number specifies one of 4096 TCBs to be locked. This register is modified by only the TIkNmlReq and 
TIkRcvReq commands. The contents of TIkTcbNum are continuously compared with the 

command parameter CmdTcb and the resultant status is used to determine if the specified TCB Is locked or unlocked. 

TllcGntFlg - Grant Flag indicates that the associated CPU Context has been granted Tcb ownership^ Set by the 
commands TIkNmlReq and TIkRcvReq or when the CPU Context has a queued ^^ques which is sched^^^^ serviced 
next and a difTerent CPU Context relinquishes ownership. Grant Flag is cleared dunng the TlkLckRls command. 
TlkChnFlg - Chain Flag indicates that a different CPU Context has requested the same TCB-Lock and that ^est has 
been scheduled to be serviced next. TlkChnFlg is set during TIkNmlReq and TIkRcvReq commands and is cleared dunng 
TIkRcvPop and TlkLckRls commands. 

TIkPriEnd - Priority End indicates that the request is the last request in the priority sub-chain. It is set or cleared during 
TIkRcvPop, TlkLckRls, TMgReqNinI and TMgReqHgh commands. 

TlkNxtCpx - Next CpX specifies the CPU Context of the next requester and is valid if TlkChnFlg is asserted. 
FIG 42 illustrates how TCB Lock registers form a request chain. CpX[5] (CPU Context 5) is the current lock owner, CpX[2] 
is the iiext requester followed by CpX[0] and finally CpX[H]. ReqVId = 1 to indicate valid request and ownerehtp. ReqVId - 
0 indlS fhluhe corresponding CPU Context is not requesting a TCB Lock and that all o^the other register are in v^ 
GntFI?s set to indicate TCB ownership. TcbNum indicates which TCB Lock is requested. ChnFIg indicates that NxtCpx is 
valid Nx<Cpx points to the next requesting CPU Context. PriEnd indicates the end of the high pnonty request sub-chain. 

The following four commands allow the CPU to control TCB Locking. 
Request TCB Lock - Normal Priority (TIkNmlReq) 

Requests, at a normal priority level, a lock for the specified TCB on behalf of the specified CPU Context. ^^^^^^^^J^^^^^ 
already has a request for a TCB Lock other than the one specified, then TmgEr status is returned because a c^^^ x^^^^^ 
never own more than a single lock. If the context already has a request for ttie specrfied TCB Lock but does not ye own 
the TCB then TmgErr status is retumed because the specified context should be resuming with the lock granted if the 

specified context already has ownership of the specified TCB, then TmgDup status i^^eturned mdicahng succ^^^^^ 
resumption of a thread. If the specified TCB is owned by another context and Chnlnh is reset, then the request will be 
nked to the end of the request chain and TmgSIp status will be returned indicating that the thread should ret^e untH the 
lock is granted If the specified TCB is owned by another context and Chnlnh is set, then the request will not be linked and 
1 TmgSIp status will be retumed. The request chaining inhibit is provided for unanticipated situations. 



http://ww.wipo.mt/pctdb/eti/fetchjsp?SEARCH_IA=US2007010665&DBS 8/10/2010 



(WO/2007/130476) NETWORK INTERFACE DEVICE WITH 10 GB/S FULL-DUPL... Page 42 of 49 



CmdRg Field Description 

31:31 Chnlnh Request chaining inhibit. 

30:29 Rsvd 

28 : 24 CpuCx CPU Context ident i fie r . 

2 3 : 12 Rsvd 

1 1 :00 Tcbid TCB identifier. 

RapRg Field Description 

31 :29 Status 7:TmgEn- 4 : TmgSIp l:TmgDup 0:TmgGnt 28:00 Rsvd Zeroes. 
Release TCB Locl< (TlltLckRIs) 

Requests that the specified CPU Context relinquish the specified TCB Locl<. If the CPU Context does not own the TCB, 
then TmgErr status is returned. If a chained request is found, it is immediately granted the TCB Lock and the ID of the new 
; owner is returned in the response along with TmgRsm status. The current logical CPU may put the CPU Context ID of the 
i new owner on to a resume list or it may immediately resume execution of the thread by assuming the CPU Context. If no 
chained request is found, then TmgGnt status is returned. 

CmdRg Field Description 

31 : 29 Rsvd 

28:24 CpuCx CPU Context identifier 
23:12 Rsvd 

1 1-00 TcbId TCB identifier 

RapRg Fiald Description 

31 :29 Status 7 : TmgErr 4. TmgRsm 0:TmgGnt 

28:05 Rsvd Zeroes. 

04:00 NxtCpx Next requester CPU Context. 

Request TCB Lock - Receive Priority (TIkRcvReq) 

Requests, at a high priority level, a lock for the specified TCB on behalf of the specified CPU Context. If the context 
already has a request for a TCB Lock other than the one specified, then TmgErr status is relumed because a context may 
. never own more than a single lock. If the context already has a request for the specified TCB Lock but does not yet own 
i the TCB, then TmgEn- status is returned because the specified context should be resuming with the lock granted. If the 
I specified context already has ownership of the specified TCB, then TmgDup status is returned indicating successful 
resumption of a thread If the specified TCB is owned by another context and Chnlnh is reset, then the request will be 
linked to the end of the priority request sub-chain and TmgSIp status will be returned If a priority request sub-chain has not 
been previously established, then one will be established behind the head of the lock request chain by inserting the high 
priority request into the request chain between the cun-ent owner and the next nonnal priority requester The pnonty sub- 
chaining affords a means to quickly pass RcvSqr evente through to the receive queue residing within a TCB If the 
specified TCB is owned by another context and Chnlnh is set, then the request will not be linked and TmgSIp status will be 
! returned The request chaining inhibit Is provided for unanticipated situations. 

Description 

Request chainging inhibit. CPU Context identifier. TCB identifier. 

RspRq Field Description 

31 :29 Status 7. TmgErr 4. TmgSIp 1 . TmgDup O.TmgGnt 28.00 Rsvd Zeroes. 

Pop Receive Request (TIkRcvPop) 

Causes the removal of the next TCB Lock request in the receive sub-chain of the specified CPU Context If the CPU 
Context does not own the specified TCB, then TmgEn^ status is returned If there is no chained receive request detected 

then TmgEnd status is returned. 

Description 

CPU Context identifier. TCB identifier 
I RapRg Field Description 
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31-29 status 7- TmgErr 4: TmgEndO'TmgGnt | 
28:00 Rsvd Zeroes 

04:00 NxtCpK Next requester CPU Context. 

TCB Cache Control I 
Cache Buffers (Cbfs) are areas within Global Ram which have been reserved for the caching of TCBs. TcbMgr provides ; 
control and status registers for 128 Cache Buffers and can be configured to support fewer Cache Buffe^. Eac^^^^^^^^ 
Cache Buffers has an associated Cache Buffer register set comprising control and status register which fe dedicated to 
describing the Cache Buffer state and any registered TCB. TcbMgr uses these registers to identify and to lock Cache 
Buffers for TCB access by the CPU The Cache Buffer register set is defined as follows. 

Cbf State - Each of the Cache Buffers is assigned one of four states, DISABLED, VACANT IDLE or BUSY as indicated by 
the two CbfState flip-flops. The DISABLED state indicates that the Cache Buffer is not available for caching of TCBs The 
VACANT state indicates that the Cache Buffer is available for caching of TCBs but that no TCB is currently registered as 
resWent The IDLE state indicates that a TCB has been registered as resident and that the Cache Buffer is unlocked (not 
BUSY). The BUSY state indicates that a TCB has been registered as resident and that the Cache Buffer has been locked 
for exclusive use by a CPU Context 

Cbf TcbNum - TCB Number identifies a resident TCB. The identifier is valid for IDLE and BUSY states only. This value is 
Sipared against the command parameter CmdTcb and then the result is used to confirm TCB residency for a specified 
Cache Buffer or to search for a Cache Buffer wherein a desired TCB resides. 

CbfDtvFlq - A Dirty Flag is provided for each Cache Buffer to indicate that the resident TCB has been modified and needs 
to be written back to external memory. This bit is valid during the IDLE and BUSY states on'V- The Dirty Rag also seives 
to inhibit invalidation of a modified TCB. Attempts to register a new TCB or invalidate a current reg'^rahon wiN be blocked 
if the Dirty Flag of the specified Cache Buffer is asserted. This protective feature can be circumvented by asserting the 
Dirty Inhibit (Dtylnh) parameter when initiating the command. 

Cbf SIpFIg - Normally, TCB Locks ensure collision avoidance when requesting a Cache Buffer, but a situation may 
sometimes occur during which a collision takes place. Sleep Flag indicates that a thread has encountered this si^ation 
and has suspended execution while awaiting Cache Buffer collision resolution The stotjon occurs whenever a requested 
TCB is not found to be resident and an IDLE Cache Buffer containing a modified TCB is the only Cache Buffer type 
available in which to cache the desired TCB The modified TCB must be written back to the «>femal TCB Buffer befo^ 
the desired TCB can be registered. DuTrng this time, if another CPU requests the dirty TCB *en CbfSlpFIg w^^l be 
asserted, the context of the requesting CPU will be saved to CbfSlpCte and then the thread will be suspended When the 
Cache Buffer owner registers the new TCB a response is given which indicates that the suspended thread must be 
resumed. 

Cbf S IpCp X - Sleeping CPU Context indicates the thread which was suspended as a result of a Cache Buffer collision. 

Cbf CycTag - Each Cache Buffer has an associated Cycle Tag register which indicates the order in which it is to be 
removed from the VACANT Pool or the IDLE Pool for the purpose of caching a currently non-resident TCB. Two counters, 
VACANT Cache Buffer Count (VacCbfCnt) and IDLE Cache Buffer Count (IdlCbfCnt) indicate the number of Cache 
Buffers which are in the VACANT or IDLE states When a Cache Buffer transitions to the VACANT state o to the IDLE 
State, the value in VacCbfCnt or IdlCbfCnt is copied to the CbfCycTag register and then ^^e counter is incre^^^^^^^ 
indicate that a Cache Buffer has been added to the pool When a Cache Buffer ,s removed from the VACANT Pool or the 
IDLE Pool, any Cache Buffer in the same pool will have it's CbfCycTag decremented providing that it s CbfCycTag 
contains a value greater than that of the exiting Cache Buffer Also, the respective '^o'^.^jf ■ ^^^CbfCnt o^ 
is decremented to indicate that one less Cache Buffer is in the pool. CbfCycTag is valid for the VACANT and IDLE states 
and is not valid for the DISABLED and BUSY states. The CbfCycTag value of each Cache Buffer ,s continuously te^ed for 
fvalue of zero which indicates that it is the least recently used Cache Buffer in its pod. In th-f way, TcbMgr can select a 
single Cache Buffer from the VACANT Pool or from the IDLE Pool in the event that a targeted TCB is found to be 
nonresident 

The following five commands allow the CPU to initiate Cache Buffer search, lock and registration operations 
Get Cache Buffer. (CchBufGet) 

This command requests assignment of a Cache Buffer for the specified TCB^TMg ^^^t perfoms a regstry se^^^^^ 

an IDLE Cache Buffer is found wherein the specified TCB resides, then the Cache Buffer is made BUSY. TmgGnt status is 

retumed along with the Cbfld. 

If a BUSY Cache Buffer is found, wherein the specified TCB resides, and SIplnh is set, then TmgSlp status is retumed 
indicating that the Cache Buffer cannot be reserved for use by the requestor 

If a BUSY Cache Buffer is found, wherein the specified TCB resides, and SIplnh is not set, then the led CPU C^^^^^^ 
ID is saved to the CbfSlpCpx and the CbfSlpFIg is set. TmgSlp status is returned indicating that the CPU Context should 
suspend operation until the Cache Buffer has been released. 
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If the specified TCB is not found to be residing In any of the Cache Buffers but an LRE Cache Buffer Is detected then the 
Cache Buffer state will be set to BUSY, the TCB will be registered to the Cache Buffer and TmgFch status plus Cb-Td will 
be returned. 

If the specified TCB is not found to be residing in any of the Cache Buffers and no LRE Cache Buffer is detected bt^ a 
. LRU Cache Buffer is detected that does not have it's DtyFIg asserted, then the Cache Buffer state will be set to BUSY, the 

TCB will be registered to the Cache Buffer and TmgFch status will be returned along with the Cbfld. The program thread 
I should then schedule a DMA operation to copy the TCB from the extemal TCB Buffer to the Cache Buffer. 

: If the specified TCB is not found to be residing in any of the Cache Buffers, no LRE Cache Buffer is detected and a LRU 
Cache Buffer is detected that has It's DtyFIg asserted, then the Cache Buffer state will be set to BUSY and TmgFsh status 
will be returned along with the Cbfld and it's resident Tcbld. The program thread should then schedule a DMA operation to 
copy the TCB from the intemal Cache Buffer to the external TCB Buffer, then upon completion of the DMA, register the 
desired TCB by issuing a CchTcbReg command, then schedule a DMA to copy the desired TCB for it's external TCB 
Buffer to the Cache Buffer. 

Conditions Response Register 

TcbDet, CbfBsy, ISIpFIg, SIplnh (TmgSIp, 10 'be, Cbfld, 12'bO| 
TcbDet, CbfBsy, ISIpFIg, '.SIplnh (TmgSIp, 10 " bG, Cbfld, 12 "bB} 

TcbDet, ! CbfBsy (TmgGnt, 10 ' bO, Cbfld, 12 W) ! TcbDet, LreDet (TmgFch, 10 * bO, Cbfld, 12 ' bO I 

! TcbDet, ILreDet, LruDet, I DtyFIg (TmgFch, 10 'bO, Cbfld, 12 ' bO ) ! TcbDet ,! LreDet. LruDet, DtyFIg (TmgFsh, 10 'bO, 
Cbfld, Tcbldl 

Default (TmgErr, 10'bO, 7'bO,12'bO| 
Description 

Inhibits modification of SIpFIg and SIpCtK. 
Current CPU Context. Targeted TCB. 
RspRg Field Description 

31:29 Status 7:TmgErr, 6:TmgFsh, 5:TmgFch, 4:TmgSlp, 0: TmgGnt 28:19 Rsvd Reserved . 18:12 Cbfld Cache Buffer 
identifier. 1 1 :00 Tcbld TCB identifier for Flush indication. 

Modify Dirty. (CchDtyMod) 

Selects the specified Cache Buffer. If the state is BUSY and the specified TCB is registered, then the CbfDtyFIg is written 
with the value in DtyDat and a status of TmgGnt is returned. This command is intended primarily as a means to set the 
Dirty Flag. Clearing of the Dirty Flag is normally done as a result of invalidating the resident TCB. 

Conditions Response Register 

CbfBsy, rcbDet (TmgGnt, 29'bO) 

Default (TmgErr, 29'bO) 

CmdRg Field Description 

31 : 31 DtyDat Data to be written to the Dirty Flag. 30 : 19 Rsvd 18:12 Cbfld Targeted Cbf. 1 1 : 00 Tcbld Expected 
resident TCB. 

RspRg Figid Description 

31:29 Status 7:TmgEnr 0:TmgGnt 28-00 Rsvd Reserved. 
Evict and Register. (CchTcbReg) 

Requests that the TCB which is currently resident in the Cache Buffer be evicted and that the TCB which is locked by the 
specified CPU Context (TlkTcbNumfCmdCpx]) be registered. The Cache Buffer must be BUSY, CmdTcb must match the 
current registrant and Dirty must be reset or ovenidden with Dtylnh in order for this command to succeed. SlpFIg without 
SIplnh causes SIpCpx to be returned along with TmgRsm status otherwise TmgGnt status is returned. This command is 
intended to register a TCB after completing a flush DMA operation. 

Conditions Response Register'' 

CbfBsy, TcbDet, DtyFIg, Dtylnh, SIpFIg (TmgRsm, 5 "bO, SIpCpx, 19'bO) 
CbfBsy, TcbDet, I DtyFIg, SIpFIg UmgRsm, 5 "bO, SIpCpx, 19'bO> 
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CbfBsy, TcbDet, DtyFIg, Dtylnh, ' SIpFIg f TmgGrit, 29 'bei 

CbfBsy, TcbDet, I DtyFIg, iSlpFIg UmgGnt, 29'bO{ Default (TmgErr, 29'bO> 

CmdRgr Field Description 

31 31 Dtylnh Inhibits Dirty Flag detection. 30 29 Rsvd 28 24 CpuCx Current CPU Context. 23 19 Rsvd 18 12 Cbf Id 

Targeted Cbf . 11 00 TobId TCB to evict. 

RspRg Field Description 

31 :29 Status 7:TmgErr 4:TmgRsm 0:TmgGnt 

28:05 Rsvd Reserved. 

04:00 SIpCpx Cpu Context to resume. 

Evict and Release. (CchTcbEvc) 

Requests that the TCB which is cun-ently resident in the Cache Buffer be evicted and that the Cache Buffer then be 
released to the Vacant Pool The Cache Buffer must be BUSY, CmdTcb must match the current registrant and Dirty Flag 
must be reset or overridden with Dtylnh in order for this command to succeed. SIpFIg without SIplnh causes SIpCpx to be 
returned along with TmgRsm status othenwise TmgGnt status is retumed. 

Conditions Response Register 

Default (TmgEn-,29'bO) 

CbfBsy, TcbDet, DtyFIg, Dtylnh, SIpFIg (TmgRsm, 4 'bO, SIpCpx, 1 9'bOJ 
CbfBsy, TcbDet, > DtyFIg, SIpFIg (TmgRsm, 4 'bO, SIpCpx, 19 'bO} 
CbfBsy, TcbDet, DtyFIg, Dtylnh, ' SIpFIg (TmgGnt, 29'bO} 
CbfBsy, TcbDet, ' DtyFIg, ^SIpFIg (TmgGnt, 29'bO) 
Description 

Inhibits Dirty Flag detection . 
Targeted Cbf. Expected resident TCB. 
RapRg Field Description 

31 :29 Status 7: TmgEn- 4: TmgRsm 0: TmgGnt 28:05 Rsvd Reserved . 04:00 SIpCpx Cpu Context to resume. 
Release. (CchBufRls) 

Selects the specified Cache Buffer then verifies Cache Buffer BUSY and TCB registration before releasing the Cache 
Buffer. SIpFIg found causes SIpCpx to be retumed along with TmgRsm status othenwise a TmgGnt status is retumed. 
DtySet will cause the Dirty Flag to be asserted in the event of a successful Cbf release. 

Conditions Response Register 

Default {TmgEn-, 29 'bS) 

CbfBsy, TcbDet. SIpFIg (TmgRsm, 4 "be, SIpCpx, 19 'bO) 

CbfBsy. TcbDet, ^SIpFIg (TmgGnt, 29'bO) 

CmdRg Field Description 

31 :31 DtySet Causes the dirty flag to be asserted. 

30:19 Rsvd 

18:12 Cbfid Targeted Cbf. 

1 1 :00 Tcbid Expected resident TCB. 

RspRg F-.old Dosoription 

31 :29 Status 7:TmgEn- 4:TmgRsm 0:TmgGnt 

28:05 Rsvd Reserved. 
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04:00 SIpCpx Cpu Context to resume. 

The following commands are intended for maintenance and debug usage only. 
TCB iWanager Reset. (TmgReset) 

Resests all Cache Buffer registers and all TCB Lock registers. 
CmdRg Fxeld Description 
31:00 Rsvd 

RspRg Field Description 
31:00 Rsvd Reserved. 
TCB Query. (TmgTcbQry) 

Performs registry search for the specified TCB and reports Cache Buffer ID. Also perfomns TCB Lock search for the 
specified TCB and reports Cpu Context ID Additional TCB information can then be obtained by using the returned IDs 
along with the CchBufQry and TIkCpxQry commands. This command is intended for debug usage. 

CmdRq Field Description 

31:12 Rsvd 

11:00Tcbld Targeted TCB. 
RapRg Field Description 

31 :31 CbfDet Indicates the TCB is registered to a Cache Buffer. 
30:30 TLkDet Indicates the TCB is locked. 
29:24 Rsvd Reserved. 

23: 19 Cpxid ID of Cpu Context that has the TCB lock. 
1 8:12 Cbf Id ID of Cache Buffer where TCB is registered. 
11:00 Rsvd Reserved. 
Cache Buffer Query. (CchBufQry) 

Returns infonnation for the specified Cache Buffer This command is intended for debug usage. 
CmdRg Field Description 
31:19 Rsvd 

18:12 Cbfid Targeted Cbf. 

1 1 :00 Tcbid Expected resident TCB. 

RspRg Field Description 

31 :30 State 3:BUS^, 2 -IDLE, 1 VACANT, 0- DISABLED 

29 29 SIpFIg Sleep Flag 

28-28 DtyFIg Dirty Flag 

27:27 CTcEql Command TCB == CbfTcbNum 

26:26 LreDec Buffer is least recently emptied. 

25:25 LaiDel Buffer is least recently used 

24 :24 Rsvd Reserved. 

23.19 SIpGpx Sleeping CPU Context. 

18:12 CycTag Cache Buffer Cycle Tag 

11:00Tcbld TCB identifier. 
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Least Recently Emptied Query. (CchLreQry) 

Report on least recently vacated Cache Buffer. Also returns vacant buffer count. Intended for degug usage. 
CmdRg Field Description 
31:00 Psvd 

RspRg Field Description 
31-31 CbfDet Vacant Cache Buffer detected. 
30-24 VacCnt Vacant Cbf count 0 == 1 23 if CbfDet 
23-19 Rsvd Reserved . 

18-12 Cbf Id ID of least recently emptied Cache Buffer 

11:00 Rsvd Reserved . 

Least Recently Used Query- (CchLnjQry) 

Report on least recently used Cache Buffer. Also returns idle buffer count. Intended for degug usage 
CmdRg Field Description 
31:00 Rsvd 

RspRg Field Description 
31:31 CbfDet Idle Cache Buffer detected. 
30.24 IdlCnt Idle Cbf count. 0 — 128 if CbfDet. 
23.19 Rsvd Reserved . 

18.12 Cbf Id ID of least recently used Cache Buffer. 
1 1 .00 Tcbid Resident TCB identifier 
Cache Buffer Enable. (CchBufEnb) 

Enables the specified Cache Buffer. Buffer must be in the DISABLE state for this command to succeed Any other state will 
result in a TmgEnr status Intended for initial setup. 

CmdRg Field Description 

31 17 Rsvd 

1 8-1 2 Cbfid Targeted Cbf 
11-00 Rsvd 

RspRg Field Description 

31 :29 Status 7. TmgEn- O.TmgGnt 28:00 Rsvd Reserved. 
TCB Lock Query - Cpu Context (TIkCpxQry) 

Returns the lock registers for the specified CPU Context. TcbDet indicates that CmdTcbId is valid and identical to 
TIkTcbNum. This command is intended for diagnostic and debug use. 

Field Desori.pfci.on 

31.29 Rsvd 
> 28 24 CpuCK CPU ConteKt. 23 12 Rsvd 
' 11 :00 Tabid Expected resident TCB. 

RspRg Field Description 

31 31 ReqVId Lock request is valid. 30 30 GntFIg Lock has been granted. 29 29 ChnFIg Request is chained. 28 23 priEnd 
! End of priority sub-chain. 27 27 CTcDec CmdTcbId == TIkTcbNum. 26 24 Rsvd Reserved. 23 19 NxtCpx Next requesting 
j CPU Context 18 12 Rsvd 1 1 00 TcbId Identifies the requested TCB. 
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HOSTBUS INTERFACE ADAPTOR (HstBIA/BIA) 

Host Event Queue (HstEvtQ) 

FIG 43 is a diagram of Host Event Queue Control/Data Paths. The HstEvtQ is a (unction implemented within the Dmd. it is 
responsible for delivering slave write descriptors, from the host bus interface to the CPU. 

Write Dasciptor Entry: 

Bits Name Description 

31:31 Rsvd Zero. 

30.30 Func2 Write to Memory Space of Function 2. 
29:29 Fund Write to Memory Space of Function 1 . 
28:28 FuncO Write to Memory Space of Function 0. 
27:24 Marks Lane mariners indicate valid bytes. 
23:23 wrdVId Marks == 4 'bllll. 
22:20 Rsvd Zeroes 

19-00 SIvAdr Slave address bits 19' 00. 
Write Data Entry- 
Bits Name Description 
31 :00SlvDat Slave data. 
if(CmdCd==0){ 
18:12 Cpid Cpu Id. 

1 1 :07 ExtCd Specifies 1 of 32 commands. 06:03 CmdCd Zero indicates extended (non-TCB) mode. 02:00 Always 0 

) else { 

18:07 TcbId Specifies 1 of 4096 TCBs. 06:03 CmdCd 1 of 15 TCB commands. 02:00 Always 0 
CONVENTIONAL PARTS 

PCI EXPRESS, EIGHT LANE o LVDS I/O Cells o Pll o Phy o Mac o Link Controller o Transaction Controller 

RLDRAM, 76 BIT, 500Mb/S/Pin o LVDS, HSTL I/O Cells o Pll o Dll 

XGXS/XAUl o CML I/O Cells o PH o Controller 

SERDES 0 LVDS o SERDES CoTrtoller 

RGMII o HSTL I/O Cells 

MAC o 1 0/1 00/1 000 Mac o lOGbe Mac 

SRAM o Custom Single Port Sram o Custom Dual Port Srams, 2-RW Ports o Custom Dual Port Srams, 1-R 1-W Port 
PLLs 

FIG. 44 Is a diagram of Global RAM Control. 

FIG. 45 is a diagram of Global RAM to Buffer RAM. 

FIG. 46 is a diagram of Buffer RAM to Global RAM. 

FIG. 47 is a Global RAM Controller timing diagram. In this diagram, odd clock cycles are reserved for read operations, and 
even cycles are reserved for write operations. As shown in the figure: 

1 ) Write request is presented to controller. 

2) Read request is presented to controller. 

3) Write 0 data, write 0 address and write enable are presented to RAM while OddCyc is false. 

4) Read 0 address is presented to RAM while OddCyc is true. 
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WO 2007130476 20071 115 CLAIMS 

1 A device comprising: a control block containing a combination of information representing the state of a Process; a 
processor that accesses the control block to read and update the information; and a control block manager that allocates 
the accessing of the control block by the processor. 

2. The device of claim 1 , wherein the process includes communication according to Transmission Control Protocol (TCP). 

3. The device of claim 1, wherein the control block contains information corresponding to TCP. 

4. The device of claim 1 , wherein the control block contains information corresponding to a TCP connection. 

5. The device of claim 1 , wherein the control block contains information corresponding to a TCP Control Block (TCB). 

6. The device of claim 1 , wherein the control block contains Information con-esponding to a transport layer of a 
communication protocol. 

7. The device of claim 1 , wherein the control block contains infomiation con^sponding to a network layer of a 

communication protocol. 

8. The device of claim 1 , wherein the control block contains information corresponding to a Media Access Control (MAC) 
layer of a communication protocol. 

9 The device of claim 1 , wherein the control block contains information corresponding to an upper layer of a 
communication protocol, the upper layer being higher than a transport layer. 

10. The device of claim 1 , wherein the control block contains infomiation con-esponding to at least three layers of a 
communication protocol. 

11 . The device of claim 1 , wherein the device provides a communication interface for a host. 

12. The device of claim 1. wherein the control block has been transfened to the device from a host. 

1 3. The device of claim 1 , wherein the control block is not established by the device. 

14 The device of claim 1, wherein the processor has a plurality of contexts, with each context representing a group of 
resourc^ available to the processor when operating within the context, and the control block manager allows only one of 
the contexts to access the control block at one time. 

15. A device comprising: a plurality of control blocks, each of the control t)locks containing ^ co'"b'"^|i°"°f '"^^^^ 
r-epresenting the state of a process; a processor that accesses the control blocks to read and update the information, and 
a control block manager that allocates the accessing of the control blocks by the processor. 

16 The device of claim 15, wherein the processor has a plurality of contexts, with each context representing a group of 
resourc^ available to the processor when operating within the context, and the control block manager allows, for each of 
the control blocks, only one of the contexts to access that control block at one time. 

17 The device of claim 15, wherein the process includes communication according to the Transmission Control Protocol 
(TCP). 

18. The device of claim 15, wherein each control block contains infomiation con-esponding to TCP. 

19. The device of claim 15, wherein each control block contains information corresponding to a different TCP connection. 
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20. The device of claim 15, wherein each control block contains information corresponding to a different TCP Control 

Block (TGB). 

21 . The device of claim 15, wherein each control block contains infomnation con-esponding to a transport layer of a 
communication protocol 

22 The device of claim 1 5, wherein each control block contains information corresponding to a network layer of a 
communication protocol. 

23. The device of claim 15, wherein each control block contains information corresponding to a Media Access Control 
(MAC) layer of a communication protocol. 

24. The device of claim 1 5, wherein each control block contains information corresponding to an upper layer of a 
communication protocol, the upper layer being higher than a transport layer. 

25. The device of claim 15, wherein each control block contains infomiation corresponding to at least three layers of a 
communication protocol. 

26. The device of claim 15, wherein the device provides a communication interface for a host. 

27. The device of claim 15, wherein at least one of the control blocks has been transferred to the device from a host 

28. The device of claim 1 5, wherein at least one of the control blocks is not established by the device. 

29. The device of claim 15, wherein the control block manager manages storage of the control blocks in a memory. 

30. A device comprising: a control block containing a combination of information representing the state of a Process; a 
plurality of processors that access the control block to read and update the infomnation; and a control block manager that 
allocates the accessing of the control block by the processors. 

31. The device of claim 30, wherein the process includes communication according to the Transmission Control Protocol 
(TCP). 

32. The device of claim 30, wherein the control block contains information corresponding to TCP. 

33. The device of claim 30, wherein the control block contains information con-esponding to a TCP connection. 

34. The device of claim 30, wherein the control block contains information corresponding to a TCP Control Block (TCB). 

35. The device of claim 30, wherein the device provides a communication interface for a host. 

36. The device of claim 30, wherein the control block contains information corresponding to a transport layer of a 
communication protocol. 

37. The device of claim 30, wherein the control block contains information corresponding to a network layer of a 
communication protocol. 

38. The device of claim 30, wherein the control block contains information con-esponding to a Media Access Control 
(MAC) layer of a communication protocol. 

39. The device of claim 30, wherein the control block contains information corresponding to an upper layer of a 
communication protocol, the upper layer being higher than a transport layer. 

40. The device of claim 30, wherein the control block contains information con-esponding to at least three layers of a 
communication protocol. 

41. The device of claim 30, wherein the processors are pipelined. 

42. The device of claim 30, wherein the processors share hardware, with each of the processors occupying a different 
phase at a single time. 

43. The device of claim 30, wherein the device provides a communication interface for a host, 
i 44. The device of claim 30, wherein the control block has been transferred to the device from a host. 
I 45. The device of claim 30, wherein the control block is not established by the device 

46. The device of claim 30, wherein the control block manager manages storage of the control block in a memory. 

47 The device of claim 30, wherein each of the processors has a plurality of contexts, with each context representing a 
group of resources available to the processor when operating within the context, and the control block manager allows 
only one of the contexts to access the control block at one time. 

48 A device comprising: a plurality of control blocks, each of the control blocks containing a combination of infomiation 
representing the state of a process: a plurality of processors that access the control blocks to read and update the 

! informationfand a control block manager that allocates the accessing of the control blocks by the processors. 
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49 The device of claim 48, wherein each of the processors has a plurality of contexts, with each context representing a 
group of resources available to the processor when operating within the context, and the control block manager allows 
only one of the contexts to access any one of the the control blocks at one time. 

50 The device of claim 48, wherein the control block manager manages storage of the control blocks in a memory. 

51 The device of claim 48, wherein the process includes communication according to the Transmission Control Protocol 
(TCP). 

52 The device of claim 48, wherein each control block contains information corresponding to TCP 

53 The device of claim 48, wherein each control block contains information corresponding to a different TCP connection 

1 54. The device of claim 48, wherein each control block contains infomiation corresponding to a different TCP Control 
j Block (TCB). 

55 The device of claim 48 wherein the device provides a communication interface for a host. 

' 56. The device of claim 48, wherein each control block contains information corresponding to a transport layer of a 
communication protocol 

57 The device of claim 48, wherein each control block contains infomiation corresponding to a network layer of a 
communication protocol 

58 The device of claim 48, wherein each control block contains infomiation corresponding to a Media Access Control 
I (MAC) layer of a communication protocol. 

I 59. The device of claim 48, wherein each control block contains Infomiation corresponding to an upper layer of a 
! communication protocol, the upper layer being higher than a transport layer 

' 60 The device of claim 48, wherein each control block contains information corresponding to at least three layers of a 
communication protocol. 

61 The device of claim 48, wherein each control block is only accessed by one processor at a time. 

i 62 The device of claim 48, wherein each control block contains infomiation corresponding to a different TCP connection, 
and each control block is only accessed by one processor at a time 

63 The device of claim 48, wherein the processors are pipelined. 

64 The device of claim 48, wherein the processors share hardware, with each of the processors occupying a different 

phase at a time. 

. 65 The device of claim 48, wherein the device provides a communication interface for a host. 

: 66. The device of claim 48, wherein at least one of the control blocks has been transferred to the device from a host. 

67. The device of claim 48, wherein at least one of the control blocks is not established by the device. 

' 68 The device of claim 48, wherein the control block manager grants locks to the plurality of processors, each of the locks 
being defined to allow access to a specific one of the control blocks by only one of the processors at a time, the control 
block manager maintaining a queue of lock requests that have been made by all of the processors for each lock. 

, 69 The device of claim 48, wherein the plurality of processors each has a plurality of contexts, with each context 
I representing a group of resources available to the processor when operating within the context, and the control block 

manager grants locks to the plurality of processors, each of the locks being defined to allow access to a specific one of the 
' control blocks by only one of the contexts at a time. 

70. The device of claim 69, wherein the control block manager maintains a queue of lock requests that have been made 
by all of the contexts for each lock. 

71 . The device of claim 48, wherein the plurality of processors each has a plurality of contexts, with each context 
representing a group of resources available to the processor when operating within the context, and the control block 

, manager allows, for each of the control blocks, only one of the contexts to access that control block at one time. 

72. The device of claim 69 or 71 , wherein the control block manager maintains a queue of requests to access each 
control block that have been made by all of the contexts for each lock. 

73 A device comprising: a plurality of control blocks stored in a memory, each of the control blocks containing a 
combination of information representing the state of a process; a plurality of processors that access the control blocks to 
read and update the information; and 

a control block manager that manages storage of the control blocks in the memory and allocates the accessing of the 
control blocks by the processors. 
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""""re. Th§Eag9?ge Bf'BfSPi^^ePStfl'fr^SEShtrol block manager allocates the accessing of the control blocks by the 
processors based at least In part upon a predetennnlned priority of functions provided by the processors. 

77. The device of claim 73, wherein the control block manager allocates the accessing of the control blocks by the 
processors based at least in part upon a predetemnined priority of contexts, wherein each context represents a group of 
resources available to the processors when operating within the context. 

78. A device comprising; a queue that stores packets that conespond to a single media access control (MAC) address; 
parser hardware that reads the network and transport header information of each packet stored In the queue, Including 
determining a socket for each packet having Internet Protocol (IP) and Transmission Control Protocol (TCP) headers; and 
socket detector hardware that detennnines whether each socket matches a TCP Control Block (TCB) that Is stored on a 
memory accessible by the socket detector. 

79. The device of claim 78, wherein the TCB was established by a CPU. 

80. The device of claim 78, wherein the IP addresses and TCP ports of each packet that has an IP header and a TCP 
header are stored In a FIFO that is accessible by the socket detector hardware. 

81. The device of claim 78, wherein the socket detector hardware employs a hash of the IP addresses and TCP ports of 
each packet that has an IP header and a TCP header to detemriine a con-esponding TCB. 

82. The device of claim 78, further comprising a processor that updates the TCB. 

83. The device of claim 78, wherein the socket detector hardware creates a receive event descriptor that is stored in a 
receive event queue, the receive event descriptor including a TCB identifier (TCBID) that identifies the TCB to which the 
packet con-esponds, and a Receive Buffer ED that identifies where in memory the packet is stored. 

84. A device comprising: a plurality of processors that are pipelined, such that the processors share hardware with each of 
the processors occupying a different pipeline phase at a time, the processors adapted to provide a plurality of functions; 
and a lock manager that grants locks to the plurality of processors, each of the locks being defined to allow access to a 
specific one of the functions by only one of the processors at a time, the lock manager maintaining a queue of lock 
requests by all of the processors for each lock. 

85. The device of claim 84, wherein the processors are adapted to perform protocol processing. 

86. The device of claim 84, wherein the plurality of functions each con-esponds to a context, with each context 
representing a group of resources available to the processor when operating within the context. 



http://www.wipo.int/pctdb/en/fetcli.jsp?SEARCH_IA=US2007010665&DBSELECT=PCT... 8/10/2010 



